SlideShare une entreprise Scribd logo
1  sur  55
Parallel Computing on GPUs Christian Kehl 01.01.2011
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 2
Basics of Parallel Computing Ref.: René Fink, „Untersuchungen zur Parallelverarbeitung mit wissenschaftlich-technischen Berechnungsumgebungen“, Diss Uni Rostock 2007 3
Basics of Parallel Computing 4
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 5
Brief Historyof SIMD vs. MIMD Architectures 6
Brief Historyof SIMD vs. MIMD Architectures 7
Brief Historyof SIMD vs. MIMD Architectures 8
Brief Historyof SIMD vs. MIMD Architectures 2004– programmable GPU Core via Shader Technology 2007 – CUDA (Compute Unified Device Architecture) Release 1.0 December 2008 – First Open Compute Language Spec March 2009 – Uniform Shader, first BETA Releases of OpenCL August 2009 – Release and Implementation of  OpenCL 1.0 9
Brief Historyof SIMD vs. MIMD Architectures SIMD technologies in GPUs: Vector processing (ILLIAC IV) mathematical operation units (ILLIAC IV) Pipelining (CRAY-1) local memory caching (CRAY-1) atomic instructions (CRAY-1) synchronized instruction execution and memory access (MASPAR) 10
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 11
Platform Model OpenCL One Host + one or more Compute Devices EachCompute Deviceis composed of one or moreCompute Units EachCompute Unitis further divided into one or moreProcessing Elements 12
Kernel Execution OpenCL Total number of work-items = Gx * Gy Size of each work-group = Sx * Sy Global ID can be computed from work-group ID and local ID 13
Memory Management OpenCL 14
Memory Management OpenCL 15
Memory Model OpenCL Address spaces Private - private to a work-item Local - local to a work-group Global - accessible by all work-items in all work-groups Constant - read only global space 16
Programming Language OpenCL Every GPU Computing technology natively written in C/C++ (Host) Host-Code Bindings to several other languages are existing (Fortran, Java, C#, Ruby) Device Code exclusively written in standard C + Extensions 17
Language Restrictions OpenCL Pointers to functions not allowed Pointers to pointers allowed within a kernel, but not as an argument Bit-fields not supported Variable-length arrays and structures not supported Recursion not supported Writes to a pointer of types less than 32-bit not supported Double types not supported, but reserved 3D Image writes not supported Some restrictions are addressed through extensions 18
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 19
Common Application Domain Multimedia Data and Tasks best-suitedfor SIMD Processing Multimedia Data – sequentialBytestreams; each Byte independent Image Processing in particularsuitedfor GPUs original GPU task: „Compute <several FLOP> forevery Pixel ofthescreen“ ( Computer Graphics) same taskforimages, onlyFLOP‘sare different 20
Common Application Domain –  Image Processing possiblefeaturesrealizable on the GPU contrast- andluminanceconfiguration gammascaling (pixel-by-pixel-) histogramscaling convolutionfiltering edgehighlighting negative image / imageinversion … 21
Inversion Image Processing simple example: Inversion implementationanduseof a frameworkforswitchingbetween different GPGPU technologies creationof a commandqueueforeach GPU reading GPU kernel via kernelfile on-the-fly creationofbuffersforinputandoutputimage memorycopyofinputimagedatato global GPU memory setofkernelargumentsandkernelexecution memorycopyof GPU outputbufferdatatonewimage 22
Image Processing Inversion evaluatedandconfirmedminimumspeedup – G80 GPU OpenCL   VS.   8-core-CPU OpenMP 		4	   :			1 23
GPU Computing Case Study: Monte Carlo-Study of a Spring-Mass-System on GPUs
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 25
MC Study of a SMS using OpenCL andOpenMP Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée 26
Task Spring-Mass-System definedby a differential equation Behaviorofthesystem must besimulatedovervaryingdampingvalues Therefore: numericalsolution in t; tε[0.0 … 2] sec. for a stepsize h=1/1000 Analysis ofcomputation time andspeed-upfor different computearchitectures 27
Task based on Simulation News Europe (SNE) CP2: 1000 simulationiterationsoversimulationhorizonwithgenerateddampingvalues (Monte-Carlo Study) consequtiveaveragingfor s(t) tε[0 … 2] sec; h=0.01  200 steps 28
Task on presentarchitecturestoolightweighted 	-> Modification: 5000 iterationswith Monte-Carlo h=0.001  2000 steps Aimof Analysis: Knowledgeabout spring behaviorfor different dampingvalues (trajectoryarray) 29
Task Simple Spring-Mass-System 	d … dampingconstant 	c … spring constant Movement equationderivedbyNewton‘s 2ndaxiom Modelling needed -> „Massenfreischnitt“ massismoved forcebalancing Equation 30
MC Study of a SMS using OpenCL andOpenMP 31 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
Modelling numericalintegrationbased on 2nd order differential equation DE order n  n DEs 1st order 32
Modelling Transformation bysubstitution 33 ,[object Object]
 5000 iterations,[object Object]
Euler as simple ODE solver numericalintegrationby explicit Euler method 35
MC Study of a SMS using OpenCL andOpenMP 36 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
existing  MIMD Solutions 37
existing  MIMD Solutions Approach can not beappliedto GPU Architectures MIMD-Requirements: each PE withowninstructionflow each PE canaccess RAM individually GPU Architecture -> SIMD each PE computesthe same instructionatthe same time each PE hastobeatthe same instructionforaccessing RAM  Therefore: Development SIMD-Approach 38
MC Study of a SMS using OpenCL andOpenMP 39 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
An SIMD Approach S.P./R.F.: simultaneousexecutionofsequential Simulation withvarying d-Parameter on spatiallydistributedPE‘s Averagingdependend on trajectories C.K.: simultaneouscomputationwith all d-Parameters for time tn, iterative repetitionuntiltend Averagingdependend on steps 40
An SIMD-Approach 41
MC Study of a SMS using OpenCL andOpenMP 42 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
OpenMP Parallization Technology based on sharedmemoryprinciple synchronizationhiddenfordeveloper threadmanagementcontrolable For System-V-based OS: parallizationbyprocessforking For Windows-based OS: parallizationbyWinThreadcreation (AMD Study/Intel Tech Paper) 43
OpenMP in C/C++: pragma-basedpreprocessordirectives in C# representedby ParallelLoops morethan just parallizing Loops (AMD Tech Report) Literature: AMD/Intel Tech Papers Thomas Rauber, „Parallele Programmierung“ Barbara Chapman, „UsingOpenMP: Portable Shared Memory Parallel Programming“ 44
MC Study of a SMS using OpenCL andOpenMP 45 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plot Speed-Up-Study ParallizationConclusions Resumée
Result Plot resultingtrajectoryfor all technologies 46
MC Study of a SMS using OpenCL andOpenMP 47 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
Speed-Up Study 48 OpenMP – own Study – Comparison CPU/GPU SIMD Single: presented SIMD approach on CPU SIMD OpenMP: presented SIMD approachparallized on CPU SIMD OpenCL: Controlofnumberofexecutingunits not possible, thereforeonly 1 value
Speed-Up Study 49 SIMD OpenCL SIMD single MIMD single SIMD OpenMP MIMD OpenMP
MC Study of a SMS using OpenCL andOpenMP 50 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
ParallizationConclusions problemunsuitedfor SIMD parallization On-GPU-Reductiontoo time expensive,  Therefore: Euler computation on GPU Averagecomputation on CPU most time intensive operation: MemCopybetween GPU and Main Memory formorecomplexproblems oder different ODE solverprocedurespeed-upbehaviorcanchange 51
ParallizationConclusion MIMD-Approach S.P./R.F. efficientfor SNE CP2 OpenMPrealizationfor MIMD- and SIMD-Approach possible (anddone) OpenMP MIMD realizationalmost linear speedup moreset Threads than PEs physicallyavailableleadstosignificant Thread-Overhead OpenMPchoosesautomaticallynumberthreadstophysicalavailable PEs fordynamicassignement 52
MC Study of a SMS using OpenCL andOpenMP 53 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
Resumée taskcanbesolved on CPUs and GPUs For GPU Computing newapproachesandalgorithmportingrequired although GPUs have massive numberof parallel operatingcores, speed-up not foreveryapplicationdomainpossible 54
Resumée Advantages GPU Computing: forsuitedproblems (e.g. Multimedia) very fast andscalable cheap HPC technology in comparisontoscientificsupercomputers energy-efficient massive computing power in smallsize Disadvantage GPU Computing: limited instructionset strictly SIMD SIMD Algorithmdevelopmenthard noexecutionsupervision (e.g. segmentation/page fault) 55

Contenu connexe

Tendances

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportLinaro
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesMarina Kolpakova
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowMarina Kolpakova
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 releaseLuba Tang
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerLinaro
 
Parallel program design
Parallel program designParallel program design
Parallel program designZongYing Lyu
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineYusuke Izawa
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)Enrique Monzo Solves
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityLinaro
 
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLinaro
 
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Takahiro Harada
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me_xhr_
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from IntelEdge AI and Vision Alliance
 

Tendances (20)

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-Vectorizer
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
 
Tools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software ApplicationsTools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software Applications
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
 
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
 

En vedette

Notes2StudyGST-160511
Notes2StudyGST-160511Notes2StudyGST-160511
Notes2StudyGST-160511xiaozhong hua
 
gtkgst video in your widgets!
gtkgst video in your widgets!gtkgst video in your widgets!
gtkgst video in your widgets!ystreet00
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basicsnpinto
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and MoreMark Kilgard
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An IntroductionDhan V Sagar
 
Graphics processing unit (gpu)
Graphics processing unit (gpu)Graphics processing unit (gpu)
Graphics processing unit (gpu)junliwanag
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPUIlya Kuzovkin
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsMark Kilgard
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPUChetan Gole
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)self employed
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science Domino Data Lab
 

En vedette (15)

Notes2StudyGST-160511
Notes2StudyGST-160511Notes2StudyGST-160511
Notes2StudyGST-160511
 
Haskell Accelerate
Haskell  AccelerateHaskell  Accelerate
Haskell Accelerate
 
gtkgst video in your widgets!
gtkgst video in your widgets!gtkgst video in your widgets!
gtkgst video in your widgets!
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
Graphics processing unit (gpu)
Graphics processing unit (gpu)Graphics processing unit (gpu)
Graphics processing unit (gpu)
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 

Similaire à GPU Computing

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Anil Madhavapeddy
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdfTigabu Yaya
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsUnai Lopez-Novoa
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Jorisimec.archive
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesAlexander Krizhanovsky
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareESUG
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
 

Similaire à GPU Computing (20)

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
 
High-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolarisHigh-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolaris
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
 
Multicore
MulticoreMulticore
Multicore
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakes
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
 

Plus de Christian Kehl

From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...Christian Kehl
 
Cuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological DataCuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological DataChristian Kehl
 
Mobile Outcrop Geology using tablets
Mobile Outcrop Geology using tabletsMobile Outcrop Geology using tablets
Mobile Outcrop Geology using tabletsChristian Kehl
 
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Christian Kehl
 
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...Christian Kehl
 
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...Christian Kehl
 
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...Christian Kehl
 
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)Christian Kehl
 
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Christian Kehl
 
WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...Christian Kehl
 
Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)Christian Kehl
 
Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Christian Kehl
 
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...Christian Kehl
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingChristian Kehl
 
Depth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theoryDepth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theoryChristian Kehl
 
Graph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese PostmanGraph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese PostmanChristian Kehl
 
Computer Graphics Modellering engels
Computer Graphics Modellering engelsComputer Graphics Modellering engels
Computer Graphics Modellering engelsChristian Kehl
 
Video-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEndVideo-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEndChristian Kehl
 

Plus de Christian Kehl (20)

From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...
 
Cuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological DataCuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological Data
 
Mobile Outcrop Geology using tablets
Mobile Outcrop Geology using tabletsMobile Outcrop Geology using tablets
Mobile Outcrop Geology using tablets
 
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
 
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
 
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
 
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
 
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
 
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
 
WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...
 
Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)
 
Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...
 
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
 
LiDAR acquisition
LiDAR acquisitionLiDAR acquisition
LiDAR acquisition
 
Fluid simulation
Fluid simulationFluid simulation
Fluid simulation
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Depth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theoryDepth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theory
 
Graph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese PostmanGraph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese Postman
 
Computer Graphics Modellering engels
Computer Graphics Modellering engelsComputer Graphics Modellering engels
Computer Graphics Modellering engels
 
Video-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEndVideo-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEnd
 

Dernier

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 

Dernier (20)

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 

GPU Computing

  • 1. Parallel Computing on GPUs Christian Kehl 01.01.2011
  • 2. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 2
  • 3. Basics of Parallel Computing Ref.: René Fink, „Untersuchungen zur Parallelverarbeitung mit wissenschaftlich-technischen Berechnungsumgebungen“, Diss Uni Rostock 2007 3
  • 4. Basics of Parallel Computing 4
  • 5. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 5
  • 6. Brief Historyof SIMD vs. MIMD Architectures 6
  • 7. Brief Historyof SIMD vs. MIMD Architectures 7
  • 8. Brief Historyof SIMD vs. MIMD Architectures 8
  • 9. Brief Historyof SIMD vs. MIMD Architectures 2004– programmable GPU Core via Shader Technology 2007 – CUDA (Compute Unified Device Architecture) Release 1.0 December 2008 – First Open Compute Language Spec March 2009 – Uniform Shader, first BETA Releases of OpenCL August 2009 – Release and Implementation of OpenCL 1.0 9
  • 10. Brief Historyof SIMD vs. MIMD Architectures SIMD technologies in GPUs: Vector processing (ILLIAC IV) mathematical operation units (ILLIAC IV) Pipelining (CRAY-1) local memory caching (CRAY-1) atomic instructions (CRAY-1) synchronized instruction execution and memory access (MASPAR) 10
  • 11. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 11
  • 12. Platform Model OpenCL One Host + one or more Compute Devices EachCompute Deviceis composed of one or moreCompute Units EachCompute Unitis further divided into one or moreProcessing Elements 12
  • 13. Kernel Execution OpenCL Total number of work-items = Gx * Gy Size of each work-group = Sx * Sy Global ID can be computed from work-group ID and local ID 13
  • 16. Memory Model OpenCL Address spaces Private - private to a work-item Local - local to a work-group Global - accessible by all work-items in all work-groups Constant - read only global space 16
  • 17. Programming Language OpenCL Every GPU Computing technology natively written in C/C++ (Host) Host-Code Bindings to several other languages are existing (Fortran, Java, C#, Ruby) Device Code exclusively written in standard C + Extensions 17
  • 18. Language Restrictions OpenCL Pointers to functions not allowed Pointers to pointers allowed within a kernel, but not as an argument Bit-fields not supported Variable-length arrays and structures not supported Recursion not supported Writes to a pointer of types less than 32-bit not supported Double types not supported, but reserved 3D Image writes not supported Some restrictions are addressed through extensions 18
  • 19. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 19
  • 20. Common Application Domain Multimedia Data and Tasks best-suitedfor SIMD Processing Multimedia Data – sequentialBytestreams; each Byte independent Image Processing in particularsuitedfor GPUs original GPU task: „Compute <several FLOP> forevery Pixel ofthescreen“ ( Computer Graphics) same taskforimages, onlyFLOP‘sare different 20
  • 21. Common Application Domain – Image Processing possiblefeaturesrealizable on the GPU contrast- andluminanceconfiguration gammascaling (pixel-by-pixel-) histogramscaling convolutionfiltering edgehighlighting negative image / imageinversion … 21
  • 22. Inversion Image Processing simple example: Inversion implementationanduseof a frameworkforswitchingbetween different GPGPU technologies creationof a commandqueueforeach GPU reading GPU kernel via kernelfile on-the-fly creationofbuffersforinputandoutputimage memorycopyofinputimagedatato global GPU memory setofkernelargumentsandkernelexecution memorycopyof GPU outputbufferdatatonewimage 22
  • 23. Image Processing Inversion evaluatedandconfirmedminimumspeedup – G80 GPU OpenCL VS. 8-core-CPU OpenMP 4 : 1 23
  • 24. GPU Computing Case Study: Monte Carlo-Study of a Spring-Mass-System on GPUs
  • 25. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 25
  • 26. MC Study of a SMS using OpenCL andOpenMP Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée 26
  • 27. Task Spring-Mass-System definedby a differential equation Behaviorofthesystem must besimulatedovervaryingdampingvalues Therefore: numericalsolution in t; tε[0.0 … 2] sec. for a stepsize h=1/1000 Analysis ofcomputation time andspeed-upfor different computearchitectures 27
  • 28. Task based on Simulation News Europe (SNE) CP2: 1000 simulationiterationsoversimulationhorizonwithgenerateddampingvalues (Monte-Carlo Study) consequtiveaveragingfor s(t) tε[0 … 2] sec; h=0.01  200 steps 28
  • 29. Task on presentarchitecturestoolightweighted -> Modification: 5000 iterationswith Monte-Carlo h=0.001  2000 steps Aimof Analysis: Knowledgeabout spring behaviorfor different dampingvalues (trajectoryarray) 29
  • 30. Task Simple Spring-Mass-System d … dampingconstant c … spring constant Movement equationderivedbyNewton‘s 2ndaxiom Modelling needed -> „Massenfreischnitt“ massismoved forcebalancing Equation 30
  • 31. MC Study of a SMS using OpenCL andOpenMP 31 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 32. Modelling numericalintegrationbased on 2nd order differential equation DE order n  n DEs 1st order 32
  • 33.
  • 34.
  • 35. Euler as simple ODE solver numericalintegrationby explicit Euler method 35
  • 36. MC Study of a SMS using OpenCL andOpenMP 36 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 37. existing MIMD Solutions 37
  • 38. existing MIMD Solutions Approach can not beappliedto GPU Architectures MIMD-Requirements: each PE withowninstructionflow each PE canaccess RAM individually GPU Architecture -> SIMD each PE computesthe same instructionatthe same time each PE hastobeatthe same instructionforaccessing RAM  Therefore: Development SIMD-Approach 38
  • 39. MC Study of a SMS using OpenCL andOpenMP 39 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 40. An SIMD Approach S.P./R.F.: simultaneousexecutionofsequential Simulation withvarying d-Parameter on spatiallydistributedPE‘s Averagingdependend on trajectories C.K.: simultaneouscomputationwith all d-Parameters for time tn, iterative repetitionuntiltend Averagingdependend on steps 40
  • 42. MC Study of a SMS using OpenCL andOpenMP 42 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 43. OpenMP Parallization Technology based on sharedmemoryprinciple synchronizationhiddenfordeveloper threadmanagementcontrolable For System-V-based OS: parallizationbyprocessforking For Windows-based OS: parallizationbyWinThreadcreation (AMD Study/Intel Tech Paper) 43
  • 44. OpenMP in C/C++: pragma-basedpreprocessordirectives in C# representedby ParallelLoops morethan just parallizing Loops (AMD Tech Report) Literature: AMD/Intel Tech Papers Thomas Rauber, „Parallele Programmierung“ Barbara Chapman, „UsingOpenMP: Portable Shared Memory Parallel Programming“ 44
  • 45. MC Study of a SMS using OpenCL andOpenMP 45 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plot Speed-Up-Study ParallizationConclusions Resumée
  • 46. Result Plot resultingtrajectoryfor all technologies 46
  • 47. MC Study of a SMS using OpenCL andOpenMP 47 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 48. Speed-Up Study 48 OpenMP – own Study – Comparison CPU/GPU SIMD Single: presented SIMD approach on CPU SIMD OpenMP: presented SIMD approachparallized on CPU SIMD OpenCL: Controlofnumberofexecutingunits not possible, thereforeonly 1 value
  • 49. Speed-Up Study 49 SIMD OpenCL SIMD single MIMD single SIMD OpenMP MIMD OpenMP
  • 50. MC Study of a SMS using OpenCL andOpenMP 50 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 51. ParallizationConclusions problemunsuitedfor SIMD parallization On-GPU-Reductiontoo time expensive, Therefore: Euler computation on GPU Averagecomputation on CPU most time intensive operation: MemCopybetween GPU and Main Memory formorecomplexproblems oder different ODE solverprocedurespeed-upbehaviorcanchange 51
  • 52. ParallizationConclusion MIMD-Approach S.P./R.F. efficientfor SNE CP2 OpenMPrealizationfor MIMD- and SIMD-Approach possible (anddone) OpenMP MIMD realizationalmost linear speedup moreset Threads than PEs physicallyavailableleadstosignificant Thread-Overhead OpenMPchoosesautomaticallynumberthreadstophysicalavailable PEs fordynamicassignement 52
  • 53. MC Study of a SMS using OpenCL andOpenMP 53 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 54. Resumée taskcanbesolved on CPUs and GPUs For GPU Computing newapproachesandalgorithmportingrequired although GPUs have massive numberof parallel operatingcores, speed-up not foreveryapplicationdomainpossible 54
  • 55. Resumée Advantages GPU Computing: forsuitedproblems (e.g. Multimedia) very fast andscalable cheap HPC technology in comparisontoscientificsupercomputers energy-efficient massive computing power in smallsize Disadvantage GPU Computing: limited instructionset strictly SIMD SIMD Algorithmdevelopmenthard noexecutionsupervision (e.g. segmentation/page fault) 55
  • 56. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 56

Notes de l'éditeur

  1. - GPU-GDRAM ist weiterhin unterteilt, entsprechend der physikalischen Architektur der Verarbeitungseinheit