SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
An Introduction to OpenCL™ Using AMD GPUs 
Chris Mason Product Manager, Acceleware September 17, 2014
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
About Acceleware 
Programmer Training 
–OpenCL, CUDA, OpenMP 
–Over 100 courses taught 
–http://acceleware.com/training 
Consulting Services 
–Completed projects for: Oil & Gas, Medical, Finance, Security & Defence, Computer Aided Engineering, Media & Entertainment 
–http://acceleware.com/services 
GPU Accelerated Software 
–Seismic imaging & modeling 
–Electromagnetics 
2
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Seismic Imaging & Modeling 
AxWave 
–Seismic forward modeling 
–2D, 3D, constant and variable density models 
–High fidelity finite-difference modeling 
AxRTM 
–High performance Reverse Time Migration Application 
–Isotropic, VTI and TTI media 
HPC Implementation 
–Optimized for GPUs 
–Efficient multi-GPU scaling 
3
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Electromagnetics 
AxFDTD™ 
–Finite-Difference Time-Domain Electromagnetic Solver 
–Optimized for GPUs 
–Sub-gridding and large feature coverage 
–Multi-GPU, GPU clusters, GPU targeting 
–Available from: 
4
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Consulting Services 
Industry 
Application 
Work Completed 
Results 
Finance 
Option Pricing 
Debugged & optimized existing code Implemented the Leisen-Reimer version of the binomial model for stock option pricing 
30-50x performance improvement compared to single-threaded CPU code 
Security & Defense 
Detection System 
Replaced legacy Cell-based infrastructure with GPUs 
Implemented a GPU accelerated X-ray iterative image reconstruction and explosive detection algorithms 
Surpassed the performance targets Reduced hardware cost by a factor of 10 
CAE 
SIMULIA Abaqus 
Developed a GPU accelerated version Conducted a finite-element analysis and developed a library to offload LDLT factorization portion of the multi-frontal solver to GPUs 
Delivered an accelerated (2- 3x) solution that supports NVIDIA and AMD GPUs 
Medical 
CT Reconstruction Software 
Developed a GPU accelerated application for image reconstruction on CT scanners and implemented advanced features including job batch manager, filtering and bad pixel corrections 
Accelerated back projection by 31x 
Oil & Gas 
Seismic Application 
Converted MATLAB research code into a standalone application & improved performance via algorithmic optimizations 
20-30x speedup 
5
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Programmer Training 
OpenCL, CUDA, OpenMP 
Teachers with real world experience 
Hands-on lab exercises 
Progressive lectures 
Small class sizes to maximize learning 
90 days post training support 
“The level of detail is fantastic. The course did not focus on syntax but rather on how to expertly program for the GPU. I loved the course and I hope that we can get more of our team to take it.” 
Jason Gauci, Software Engineer 
Lockheed Martin 
6
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Outline 
Introduction to the OpenCL Architecture 
–Contexts, Devices, Queues 
Memory and Error Management 
Data-Parallel Computing 
–Kernel Launches 
GPU Kernels 
7
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Introduction To The OpenCL Architecture
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Architecture Introduction and Terminology 
Four high level models describe the key OpenCL concepts: 
–Platform Model – high level host/device interaction 
–Execution Model – OpenCL programs execute on host/device 
–Memory Model – different memory resources on device 
–Programming Model – types of parallel workloads 
9
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Platform Model 
A host connected to one or more devices 
–Example: GPUs, DSPs, FPGAs 
A program can work with devices from multiple vendors 
A platform is a host and a collection of devices that share resources and execute programs 
10 
Host 
Device 1 GPU 
Device 2 CPU 
… 
Device N GPU
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Execution Model 
The host defines a context to control the device 
–The context manages the following resources: 
–Devices – hardware to run on 
–Kernels – functions to run on the hardware 
–Program Objects – device executables 
–Memory Objects – memory visible to host and device 
A command queue schedules commands for execution on the device 
11
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL API - Platform and Runtime Layer 
The OpenCL API is divided into two layers: Platform and Runtime 
The platform layer allows the host program to discover devices and capabilities 
The runtime layer allows the host program to work with contexts once created 
12
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Program Set Up 
To set up an OpenCL program, the typical steps are as follows: 
1.Query and select the platforms (e.g., AMD) 
2.Query the devices 
3.Create a context 
4.Create a command queue 
5.Read/Write to the device 
6.Launch the kernel 
13 
Platform Layer 
Runtime Layer
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Sample Platform Layer C Code 
14 
//Get the platform ID 
cl_platform_id platform; 
clGetPlatformIDs(1, &platform, NULL); 
// Get the first GPU device associated with the platform 
cl_device_id device; 
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL); 
//Create an OpenCL context for the GPU device 
cl_context context; 
context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Runtime Layer 
A command queue operates on contexts, memory, and program objects 
Each device can have one or more command queues 
Operations in the command queue will execute in order unless the out of order mode is enabled 
15 
Copy Data 
Copy Data 
Launch Kernel 
Copy Data 
Command Queue
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Memory and Error Management
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Buffers 
A buffer stores a one dimensional collection of elements 
Buffer objects use the cl_mem type 
–cl_mem is an abstract memory container (i.e., a handle) 
–The buffer object cannot be dereferenced on the host 
•cl_mem a; a[0] = 5; // Not allowed 
OpenCL commands interact with buffers 
17
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax – C Memory Management Example 
Example: 
18 
//Create an OpenCL command queue 
cl_int err; 
cl_command_queue queue; 
queue = clCreateCommandQueue(context, device, 0, &err); 
// Allocate memory on device 
const int N = 5; 
int nBytes = N*sizeof(int); 
cl_mem a = clCreateBuffer(context, CL_MEM_READ_WRITE, 
nBytes, NULL, &err); 
int hostarr [N] = {3,1,4,1,5}; 
// Transfer Memory 
err = clEnqueueWriteBuffer(queue, a, CL_TRUE, 0, 
nBytes, hostarr, 0, NULL, 
NULL);
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax – Error Management 
Host code manages errors: 
–Most host side OpenCL function calls return cl_int 
•“Create” calls return the object that is created 
–Error code is passed by reference as last argument 
•Error codes are negative values defined in cl.h 
•CL_SUCCESS == 0 
19
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax – Clean Up 
All objects that are created can be released with the following functions: 
–clReleaseContext 
–clReleaseCommandQueue 
–clReleaseMemObject 
20
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Data-Parallel Computing
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Data-Parallel Computing 
Data-parallelism 
1.Performs operations on a data set organized into a common structure (e.g. an array) 
2.Tasks work collectively on the same structure with each task operating on its own portion of the structure 
3.Tasks perform identical operations on their portions of the structure. Operations on each portion must not be data dependent! 
22
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Data Dependence 
Data dependence occurs when a program statement refers to the data of a preceding statement. 
Data dependence limits parallelism 
23 
a = 2 * x; 
b = 2 * y; 
c = 3 * x; 
a = 2 * x; b = 2 * a * a; c = b * 9; 
These 3 statements are independent! 
b depends on a, c depends on b and a!
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Data-Parallel Computing Example 
Data set consisting of arrays A,B, and C 
Same operations performed on each element - Cx = Ax + Bx 
Two tasks operating on a subset of the arrays. Tasks 0 and 1 are independent. Could have more tasks. 
24 
A0 
A1 
A2 
A3 
A4 
A5 
A6 
A7 
B0 
B1 
B2 
B3 
B4 
B5 
B6 
B7 
C1 
C2 
C3 
C4 
C5 
C6 
C7 
C0 
Cx = Ax + Bx 
Task 0 
Task 1 
Operation
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
The OpenCL Programming Model 
OpenCL is a heterogeneous model, including provisions for both host and device 
25 
CPU 
Chipset 
DRAM 
DRAM 
DSP or GPU or FPGA 
Device 
Host 
PCIe
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
The OpenCL Programming Model 
Data-parallel portions of an algorithm are executed on the device as kernels 
–Kernels are C functions with some restrictions, and a few language extensions 
Only one kernel is executed at a time 
A kernel is executed by many work-items 
–Each work-item executes the same kernel 
26
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Work-Items 
OpenCL work-items are conceptually similar to data- parallel tasks or threads 
–Each work-item performs the same operations on a subset of a data structure 
–Work-items execute independently 
OpenCL work-items are not CPU threads 
–OpenCL work-items are extremely lightweight 
•Little creation overhead 
•Instant context-switching 
–Work-items must execute the same kernel 
27
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Work-Item Hierarchy 
OpenCL is designed to execute millions of work-items 
Work-items are grouped together into work-groups 
–Maximum # of work-items per work-group (HW limit) 
–Query CL_DEVICE_MAX_WORK_GROUP_SIZE in clDeviceInfo 
•Typically 256-1024 
The entire collection of work-items is called the N- Dimensional Range (NDRange) 
28
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Work-Item Hierarchy 
Work-groups and NDRange can be 1D, 2D, or 3D 
Dimensions set at launch time 
29 
Work-Item (3,0) 
Work-Item (1,0) 
Work-Item (2,0) 
Work-Item (0,0) 
Work-Item (3,1) 
Work-Item (1,1) 
Work-Item (2,1) 
Work-Item (0,1) 
Work-Item (3,2) 
Work-Item (1,2) 
Work-Item (2,2) 
Work-Item (0,2) 
Work-Group (1,1) 
Work-Group (0,0) 
Work-Group (1,0) 
Work-Group (2,0) 
Work-Group (0,1) 
Work-Group (1,1) 
Work-Group (2,1) 
ND Range
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
The OpenCL Programming Model 
The host launches kernels 
The host executes serial code between device kernel launches 
–Memory management 
–Data exchange to/from device (usually) 
–Error handling 
30 
Work-Group (0,0) 
Work-Group (1,0) 
Work-Group (0,1) 
Work-Group (1,1) 
Work-Group (0,2) 
Work-Group( 1,2) 
ND Range 
Work-Group (0,0) 
Work-Group (1,0) 
Work-Group (2,0) 
Work-Group (0,1) 
Work-Group (1,1) 
Work-Group (2,1) 
ND Range 
Host 
Device 
Host 
Device
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Data-Parallel Computing on GPUs 
Data-parallel computing maps well to GPUs: 
–Identical operations executed on many data elements in parallel 
•Simplified flow control allows increased ratio of compute logic (ALUs) to control logic 
31 
DRAM 
GPU 
DRAM 
CPU 
ALU 
Control 
L1 Cache 
L2 Cache 
ALU 
ALU 
ALU 
ALU 
Control 
L1 Cache 
L2 Cache 
ALU 
ALU 
ALU 
L3 Cache
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL API – Launching a Kernel C 
How to launch a kernel: 
32 
//3D Work-Group, let OpenCL Runtime determine 
//local work size. 
size_t const globalWorkSize[3] = {512,512,512}; 
clEnqueueNDRangeKernel(queue, kernel, 3, NULL, globalWorkSize, NULL, 
0, NULL, NULL); 
//2D Work-Group, specify local work size 
size_t const globalWorkSize[2] = {512,512}; 
size_t const localWorkSize[2] = {16, 16}; 
clEnqueueNDRangeKernel(queue, kernel, 2, NULL, 
globalWorkSize, localWorkSize, 
0, NULL, NULL);
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
GPU Kernels
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Writing OpenCL Kernels 
Denoted by __kernel function qualifier 
–Eg. __kernel void myKernel(__global float* a) 
Queued from host, executed on device 
A few noteworthy restrictions: 
–No access to host memory (in general!) 
–Must return void 
–No function pointers 
–No static variables 
–No recursion (no stack) 
34
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax - Kernels 
Kernels have built-in functions: 
–The variable dim ranges from 0 to 2, depending on the dimension of the kernel launch 
–get_work_dim (): number of dimensions in use 
–get_global_id (dim): unique index of a work-item 
–get_global_size (dim): number of global work-items 
35
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax – Kernels (Continued) 
Built-in function listing (continued): 
–get_local_id (dim): unique index of the work-item within the work-group 
–get_local_size (dim): number of work-items within the work-group 
–get_group_id (dim): index of the work-group 
–get_num_groups (dim): number of work-groups 
–Cannot vary the size of work-groups or work-items during a kernel call 
36
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax - Kernels 
Built-in functions are typically used to determine unique work-item identifiers: 
37 
get_group_id(0) 
get_local_size(0) = 5 
get_global_id(0) 
ND Range 
0 
0 1 2 3 4 
1 
0 1 2 3 4 
2 
0 1 2 3 4 
0 1 2 3 4 
5 6 7 8 9 
10 11 12 13 14 
get_local_id(0) 
One Dimensional Array (get_work_dim () == 1) 
get_global_id(0) == get_group_id(0) * get_local_size(0) + get_local_id(0)
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax – Thread Identifiers 
Result for each kernel launched with the following execution configuration: 
Dimension = 1 Global work size = 12 Local Work Size = 4 
38 
__kernel void MyKernel(__global int* a) 
{ 
int idx = get_global_id(0); 
a[idx] = 7; 
} 
__kernel void MyKernel(__global int* a) 
{ 
int idx = get_global_id(0); 
a[idx] = get_group_id(0); 
} 
__kernel void MyKernel(__global int* a) 
{ 
int idx = get_global_id(0); 
a[idx] = get_local_id(0); 
} 
a: 7 7 7 7 7 7 7 7 7 7 7 7 
a: 0 0 0 0 1 1 1 1 2 2 2 2 
a: 0 1 2 3 0 1 2 3 0 1 2 3
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Code Example - Kernel 
Kernel is executed by N work-items 
–Each work-item has a unique ID between 0 and N-1 
39 
void inc(float* a, float b, 
int N) 
{ 
for(int i = 0; i<N; i++) 
a[i] = a[i] + b; 
} 
void main() 
{ 
… 
increment(a,b,N); 
} 
__kernel 
void inc(__global float* a, 
float b) 
{ 
int i = get_global_id(0); 
a[i] = a[i] + b; 
} 
void main() 
{ 
… 
clEnqueueNDRangeKernel(…,…); 
}
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax - Kernels 
All C operators are supported 
–eg. +, *, /, ^, >, >> 
Many functions from the standard math library 
–eg. sin(), cos(), ceil(), fabs() 
Can write/call your own non-kernel functions 
–float myDeviceFunction(__global float *a) 
–Non-kernel functions cannot be called by host 
Control flow statements too! 
–eg. if(), while(), for() 
40
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Syntax - Synchronization 
Kernel launches are asynchronous 
–Control returns to CPU immediately 
–Subsequent commands added to the command queue will wait until the kernel has completed 
–If you want to synchronize on the host: 
•Implicit synchronization via blocking commands 
–eg. clEnqueueReadBuffer() with the blocking argument set to CL_TRUE 
–Explicitly call clFinish() 
clFinish(queue) 
–Blocks on host until all outstanding OpenCL commands are complete in a given queue 
41
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Questions? 
OpenCL training courses and consulting services 
Acceleware Ltd. 
Twitter: @Acceleware 
Web: http://acceleware.com/opencl-training 
Email: services@acceleware.com 
------------------- 
Stay in the know about developer news, tools, SDKs, technical presentations, events and future webinars. Connect with AMD Developer Central here: 
AMD Developer Central 
Twitter: @AMDDevCentral 
Web: http://developer.amd.com/ 
YouTube: https://www.youtube.com/user/AMDDevCentral 
Developer Forums: http://devgurus.amd.com/welcome 
42
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
An Overview of GPU Hardware
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
What is the GPU? 
The GPU is a graphics processing unit 
Historically used to offload graphics computations from the CPU 
Can either be a dedicated video card, integrated on the motherboard or on the same die as the CPU 
–Highest performance will require a dedicated video card 
44
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Why use GPUs? Performance! 
45 
Intel Xeon E5-2697 v2 (Ivy Bridge) 
AMD Opteron 6386SE (Bulldozer) 
AMD FirePro 
W9100 (Volcanic Islands) 
AMD FirePro S10000 (Southern Islands) 
Processing Cores 
12 
16 
2816 
3584 
Clock Frequency (GHz) 
2.7-3.4* GHz 
2.8-3.5* GHz 
930 MHz 
825 MHz 
Memory Bandwidth 
59.7 GB/s / socket 
59.7 GB/s / socket 
320 GB/s 
480 GB/s 
Peak Gflops** (single) 
576 @ 3.0GHz 
410 @ 3.2GHz 
5240 
5910 
Peak Gflops** (double) 
288 @ 3.0GHz 
205 @ 3.2GHz 
2620 
1480 
Gflops/Watt 
(single) 
4.4 
2.9 
19 
15.76 
Total Memory 
>>16GB 
>>16GB 
16 GB 
6 GB 
*Indicates range of clock frequencies supported via Intel Turbo Boost and AMD Turbo CORE Technology 
** At maximum frequency when all cores are executing
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL 
Using AMD GPUs 
GPU Potential Advantages 
 9x more single-precision floating-point throughput 
 9x more double-precision floating-point throughput 
 5x higher memory bandwidth 
46 
AMD FirePro W9100 vs. Xeon E5-2697 v2
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
GPU Disadvantages 
Architecture not as flexible as CPU 
Must rewrite algorithms and maintain software in GPU languages 
Attached to CPU via relatively slow PCIe 
–16GB/s bi-directional for PCIe 3.0 16x 
Limited memory (though 6-16GB is reasonable for many applications) 
47
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
Software Approaches for Acceleration 
Maximum Flexibility 
–OpenCL 
Simple programming for heterogeneous systems 
–Simple compiler hints/pragmas 
–Compiler parallelizes code 
–Target a variety of platforms 
“Drop-in” Acceleration 
–In-depth GPU knowledge not required 
–Highly optimized by GPU experts 
–Provides functions used in a broad range of applications (eg. FFT, BLAS) 
48 
Programming Languages 
OpenACC Directives 
Libraries 
Effort
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
An Introduction to OpenCL
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Overview 
Parallel computing architecture standardized by the Khronos Group 
OpenCL: 
–Is a royalty free standard 
–Provides an API to coordinate parallel computation across heterogeneous processors 
–Defines a cross-platform programming language 
50
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Versions 
To date there are four different versions of OpenCL 
–OpenCL 1.0 
–OpenCL 1.1 
–OpenCL 1.2 
–OpenCL 2.0 (finalized November 2013) 
Different versions support different functionality 
51 
Hardware Vendor 
Supported OpenCL Version 
AMD 
OpenCL 1.2 
Intel 
OpenCL 1.2 
NVIDIA 
OpenCL 1.1
© 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. 
An Introduction to OpenCL Using AMD GPUs 
OpenCL Extensions 
Optional functionality is exposed through extensions 
–Vendors are not required to support extensions to achieve conformance 
–However, extensions are expected to be widely available 
Some OpenCL extensions are approved by the OpenCL working group 
–These extensions are likely to be promoted to core functionality in future versions of the standard 
Multi-vendor and vendor specific extensions do not need approval by the working group 
52

Contenu connexe

Tendances

Fault Simulation (Testing of VLSI Design)
Fault Simulation (Testing of VLSI Design)Fault Simulation (Testing of VLSI Design)
Fault Simulation (Testing of VLSI Design)Usha Mehta
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...Pradeep Redddy Raamana
 
Verilog Lecture1
Verilog Lecture1Verilog Lecture1
Verilog Lecture1Béo Tú
 
Testing and Verification of Electronics Circuits : Introduction
Testing and Verification of Electronics Circuits : IntroductionTesting and Verification of Electronics Circuits : Introduction
Testing and Verification of Electronics Circuits : IntroductionUsha Mehta
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossing14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossingUsha Mehta
 
Verification Engineer - Opportunities and Career Path
Verification Engineer - Opportunities and Career PathVerification Engineer - Opportunities and Career Path
Verification Engineer - Opportunities and Career PathRamdas Mozhikunnath
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
System design techniques and networks
System design techniques and networksSystem design techniques and networks
System design techniques and networksRAMPRAKASHT1
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
Part-1 : Mastering microcontroller with embedded driver development
Part-1 : Mastering microcontroller with embedded driver development Part-1 : Mastering microcontroller with embedded driver development
Part-1 : Mastering microcontroller with embedded driver development FastBit Embedded Brain Academy
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Ibm power sales bootcamp
Ibm power sales bootcampIbm power sales bootcamp
Ibm power sales bootcampsolarisyougood
 
Linux on RISC-V with Open Hardware (ELC-E 2020)
Linux on RISC-V with Open Hardware (ELC-E 2020)Linux on RISC-V with Open Hardware (ELC-E 2020)
Linux on RISC-V with Open Hardware (ELC-E 2020)Drew Fustini
 

Tendances (20)

Fault Simulation (Testing of VLSI Design)
Fault Simulation (Testing of VLSI Design)Fault Simulation (Testing of VLSI Design)
Fault Simulation (Testing of VLSI Design)
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
 
Verilog Lecture1
Verilog Lecture1Verilog Lecture1
Verilog Lecture1
 
Testing and Verification of Electronics Circuits : Introduction
Testing and Verification of Electronics Circuits : IntroductionTesting and Verification of Electronics Circuits : Introduction
Testing and Verification of Electronics Circuits : Introduction
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossing14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossing
 
Verification Engineer - Opportunities and Career Path
Verification Engineer - Opportunities and Career PathVerification Engineer - Opportunities and Career Path
Verification Engineer - Opportunities and Career Path
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
ARM and SoC Traning Part I -- Overview
ARM and SoC Traning Part I -- OverviewARM and SoC Traning Part I -- Overview
ARM and SoC Traning Part I -- Overview
 
System design techniques and networks
System design techniques and networksSystem design techniques and networks
System design techniques and networks
 
Vlsi design flow
Vlsi design flowVlsi design flow
Vlsi design flow
 
GDB Rocks!
GDB Rocks!GDB Rocks!
GDB Rocks!
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
Part-1 : Mastering microcontroller with embedded driver development
Part-1 : Mastering microcontroller with embedded driver development Part-1 : Mastering microcontroller with embedded driver development
Part-1 : Mastering microcontroller with embedded driver development
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Ibm power sales bootcamp
Ibm power sales bootcampIbm power sales bootcamp
Ibm power sales bootcamp
 
Dissecting the Differences Between Pyranometer and Reference Cell Irradiance ...
Dissecting the Differences Between Pyranometer and Reference Cell Irradiance ...Dissecting the Differences Between Pyranometer and Reference Cell Irradiance ...
Dissecting the Differences Between Pyranometer and Reference Cell Irradiance ...
 
system verilog
system verilogsystem verilog
system verilog
 
Embedded development life cycle
Embedded development life cycleEmbedded development life cycle
Embedded development life cycle
 
Linux on RISC-V with Open Hardware (ELC-E 2020)
Linux on RISC-V with Open Hardware (ELC-E 2020)Linux on RISC-V with Open Hardware (ELC-E 2020)
Linux on RISC-V with Open Hardware (ELC-E 2020)
 

En vedette

GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD
 
Whats New in AMD - 2015
Whats New in AMD - 2015Whats New in AMD - 2015
Whats New in AMD - 2015Rick Trevino
 
Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update Low Hong Chuan
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technologyAMD
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
AMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD
 
AMD CFO Commentary slides 14Q4
AMD CFO Commentary slides 14Q4AMD CFO Commentary slides 14Q4
AMD CFO Commentary slides 14Q4Low Hong Chuan
 
"AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos"
"AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos""AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos"
"AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos"Fabrício Pinheiro
 
ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.
ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.
ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.Fabrício Pinheiro
 
AtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para ConsumerAtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para ConsumerRoberto Brandao
 
VISION Technology from AMD Powered by AMD E-Series & C-Series APUs
VISION Technology from AMD Powered by AMD E-Series & C-Series APUsVISION Technology from AMD Powered by AMD E-Series & C-Series APUs
VISION Technology from AMD Powered by AMD E-Series & C-Series APUsAdditionalResources
 
AMD Opteron 4000 Series Platform Press Presentation
AMD Opteron 4000 Series Platform Press PresentationAMD Opteron 4000 Series Platform Press Presentation
AMD Opteron 4000 Series Platform Press PresentationAMD
 
AMD CES 2013 Press Conference
AMD CES 2013 Press Conference AMD CES 2013 Press Conference
AMD CES 2013 Press Conference AMD
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 

En vedette (20)

GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
Whats New in AMD - 2015
Whats New in AMD - 2015Whats New in AMD - 2015
Whats New in AMD - 2015
 
Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technology
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
AMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick Bergman
 
AMD CFO Commentary slides 14Q4
AMD CFO Commentary slides 14Q4AMD CFO Commentary slides 14Q4
AMD CFO Commentary slides 14Q4
 
"AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos"
"AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos""AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos"
"AMD - Phenom - O Verdadeiro Processamento Com 4 Núcleos"
 
Web Seminario Phenom X3
Web Seminario Phenom X3Web Seminario Phenom X3
Web Seminario Phenom X3
 
ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.
ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.
ATi Radeon - O poder da computação visual para tirar o máximo do seu computador.
 
AtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para ConsumerAtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para Consumer
 
Chipset 780
Chipset 780Chipset 780
Chipset 780
 
Roadshow Canal AMD
Roadshow Canal AMDRoadshow Canal AMD
Roadshow Canal AMD
 
La Nueva Serie X73 AMD de QNAP
La Nueva Serie X73 AMD de QNAPLa Nueva Serie X73 AMD de QNAP
La Nueva Serie X73 AMD de QNAP
 
VISION Technology from AMD Powered by AMD E-Series & C-Series APUs
VISION Technology from AMD Powered by AMD E-Series & C-Series APUsVISION Technology from AMD Powered by AMD E-Series & C-Series APUs
VISION Technology from AMD Powered by AMD E-Series & C-Series APUs
 
AMD Opteron 4000 Series Platform Press Presentation
AMD Opteron 4000 Series Platform Press PresentationAMD Opteron 4000 Series Platform Press Presentation
AMD Opteron 4000 Series Platform Press Presentation
 
AMD CES 2013 Press Conference
AMD CES 2013 Press Conference AMD CES 2013 Press Conference
AMD CES 2013 Press Conference
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 

Similaire à An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar

Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLBoosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLJanakiRam Raghumandala
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVMThe JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVMAndy Moncsek
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosmictc
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...Edge AI and Vision Alliance
 
The Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupThe Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupIvano Malavolta
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereRodrique Heron
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 
Android tools for testers
Android tools for testersAndroid tools for testers
Android tools for testersMaksim Kovalev
 
AMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_IrvineAMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_IrvinePankaj Singh
 
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Qualcomm Developer Network
 
Master your java_applications_in_kubernetes
Master your java_applications_in_kubernetesMaster your java_applications_in_kubernetes
Master your java_applications_in_kubernetesAndy Moncsek
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Pradeep Singh
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8AbdullahMunir32
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchEMC
 

Similaire à An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar (20)

Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCLBoosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVMThe JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
 
The Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupThe Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setup
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
Android tools for testers
Android tools for testersAndroid tools for testers
Android tools for testers
 
AMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_IrvineAMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_Irvine
 
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
 
Master your java_applications_in_kubernetes
Master your java_applications_in_kubernetesMaster your java_applications_in_kubernetes
Master your java_applications_in_kubernetes
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 
Cuda
CudaCuda
Cuda
 

Plus de AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 

Plus de AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 

Dernier

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Dernier (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar

  • 1. An Introduction to OpenCL™ Using AMD GPUs Chris Mason Product Manager, Acceleware September 17, 2014
  • 2. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs About Acceleware Programmer Training –OpenCL, CUDA, OpenMP –Over 100 courses taught –http://acceleware.com/training Consulting Services –Completed projects for: Oil & Gas, Medical, Finance, Security & Defence, Computer Aided Engineering, Media & Entertainment –http://acceleware.com/services GPU Accelerated Software –Seismic imaging & modeling –Electromagnetics 2
  • 3. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Seismic Imaging & Modeling AxWave –Seismic forward modeling –2D, 3D, constant and variable density models –High fidelity finite-difference modeling AxRTM –High performance Reverse Time Migration Application –Isotropic, VTI and TTI media HPC Implementation –Optimized for GPUs –Efficient multi-GPU scaling 3
  • 4. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Electromagnetics AxFDTD™ –Finite-Difference Time-Domain Electromagnetic Solver –Optimized for GPUs –Sub-gridding and large feature coverage –Multi-GPU, GPU clusters, GPU targeting –Available from: 4
  • 5. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Consulting Services Industry Application Work Completed Results Finance Option Pricing Debugged & optimized existing code Implemented the Leisen-Reimer version of the binomial model for stock option pricing 30-50x performance improvement compared to single-threaded CPU code Security & Defense Detection System Replaced legacy Cell-based infrastructure with GPUs Implemented a GPU accelerated X-ray iterative image reconstruction and explosive detection algorithms Surpassed the performance targets Reduced hardware cost by a factor of 10 CAE SIMULIA Abaqus Developed a GPU accelerated version Conducted a finite-element analysis and developed a library to offload LDLT factorization portion of the multi-frontal solver to GPUs Delivered an accelerated (2- 3x) solution that supports NVIDIA and AMD GPUs Medical CT Reconstruction Software Developed a GPU accelerated application for image reconstruction on CT scanners and implemented advanced features including job batch manager, filtering and bad pixel corrections Accelerated back projection by 31x Oil & Gas Seismic Application Converted MATLAB research code into a standalone application & improved performance via algorithmic optimizations 20-30x speedup 5
  • 6. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Programmer Training OpenCL, CUDA, OpenMP Teachers with real world experience Hands-on lab exercises Progressive lectures Small class sizes to maximize learning 90 days post training support “The level of detail is fantastic. The course did not focus on syntax but rather on how to expertly program for the GPU. I loved the course and I hope that we can get more of our team to take it.” Jason Gauci, Software Engineer Lockheed Martin 6
  • 7. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Outline Introduction to the OpenCL Architecture –Contexts, Devices, Queues Memory and Error Management Data-Parallel Computing –Kernel Launches GPU Kernels 7
  • 8. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Introduction To The OpenCL Architecture
  • 9. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Architecture Introduction and Terminology Four high level models describe the key OpenCL concepts: –Platform Model – high level host/device interaction –Execution Model – OpenCL programs execute on host/device –Memory Model – different memory resources on device –Programming Model – types of parallel workloads 9
  • 10. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Platform Model A host connected to one or more devices –Example: GPUs, DSPs, FPGAs A program can work with devices from multiple vendors A platform is a host and a collection of devices that share resources and execute programs 10 Host Device 1 GPU Device 2 CPU … Device N GPU
  • 11. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Execution Model The host defines a context to control the device –The context manages the following resources: –Devices – hardware to run on –Kernels – functions to run on the hardware –Program Objects – device executables –Memory Objects – memory visible to host and device A command queue schedules commands for execution on the device 11
  • 12. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL API - Platform and Runtime Layer The OpenCL API is divided into two layers: Platform and Runtime The platform layer allows the host program to discover devices and capabilities The runtime layer allows the host program to work with contexts once created 12
  • 13. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Program Set Up To set up an OpenCL program, the typical steps are as follows: 1.Query and select the platforms (e.g., AMD) 2.Query the devices 3.Create a context 4.Create a command queue 5.Read/Write to the device 6.Launch the kernel 13 Platform Layer Runtime Layer
  • 14. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Sample Platform Layer C Code 14 //Get the platform ID cl_platform_id platform; clGetPlatformIDs(1, &platform, NULL); // Get the first GPU device associated with the platform cl_device_id device; clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL); //Create an OpenCL context for the GPU device cl_context context; context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);
  • 15. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Runtime Layer A command queue operates on contexts, memory, and program objects Each device can have one or more command queues Operations in the command queue will execute in order unless the out of order mode is enabled 15 Copy Data Copy Data Launch Kernel Copy Data Command Queue
  • 16. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Memory and Error Management
  • 17. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Buffers A buffer stores a one dimensional collection of elements Buffer objects use the cl_mem type –cl_mem is an abstract memory container (i.e., a handle) –The buffer object cannot be dereferenced on the host •cl_mem a; a[0] = 5; // Not allowed OpenCL commands interact with buffers 17
  • 18. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax – C Memory Management Example Example: 18 //Create an OpenCL command queue cl_int err; cl_command_queue queue; queue = clCreateCommandQueue(context, device, 0, &err); // Allocate memory on device const int N = 5; int nBytes = N*sizeof(int); cl_mem a = clCreateBuffer(context, CL_MEM_READ_WRITE, nBytes, NULL, &err); int hostarr [N] = {3,1,4,1,5}; // Transfer Memory err = clEnqueueWriteBuffer(queue, a, CL_TRUE, 0, nBytes, hostarr, 0, NULL, NULL);
  • 19. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax – Error Management Host code manages errors: –Most host side OpenCL function calls return cl_int •“Create” calls return the object that is created –Error code is passed by reference as last argument •Error codes are negative values defined in cl.h •CL_SUCCESS == 0 19
  • 20. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax – Clean Up All objects that are created can be released with the following functions: –clReleaseContext –clReleaseCommandQueue –clReleaseMemObject 20
  • 21. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Data-Parallel Computing
  • 22. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Data-Parallel Computing Data-parallelism 1.Performs operations on a data set organized into a common structure (e.g. an array) 2.Tasks work collectively on the same structure with each task operating on its own portion of the structure 3.Tasks perform identical operations on their portions of the structure. Operations on each portion must not be data dependent! 22
  • 23. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Data Dependence Data dependence occurs when a program statement refers to the data of a preceding statement. Data dependence limits parallelism 23 a = 2 * x; b = 2 * y; c = 3 * x; a = 2 * x; b = 2 * a * a; c = b * 9; These 3 statements are independent! b depends on a, c depends on b and a!
  • 24. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Data-Parallel Computing Example Data set consisting of arrays A,B, and C Same operations performed on each element - Cx = Ax + Bx Two tasks operating on a subset of the arrays. Tasks 0 and 1 are independent. Could have more tasks. 24 A0 A1 A2 A3 A4 A5 A6 A7 B0 B1 B2 B3 B4 B5 B6 B7 C1 C2 C3 C4 C5 C6 C7 C0 Cx = Ax + Bx Task 0 Task 1 Operation
  • 25. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs The OpenCL Programming Model OpenCL is a heterogeneous model, including provisions for both host and device 25 CPU Chipset DRAM DRAM DSP or GPU or FPGA Device Host PCIe
  • 26. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs The OpenCL Programming Model Data-parallel portions of an algorithm are executed on the device as kernels –Kernels are C functions with some restrictions, and a few language extensions Only one kernel is executed at a time A kernel is executed by many work-items –Each work-item executes the same kernel 26
  • 27. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Work-Items OpenCL work-items are conceptually similar to data- parallel tasks or threads –Each work-item performs the same operations on a subset of a data structure –Work-items execute independently OpenCL work-items are not CPU threads –OpenCL work-items are extremely lightweight •Little creation overhead •Instant context-switching –Work-items must execute the same kernel 27
  • 28. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Work-Item Hierarchy OpenCL is designed to execute millions of work-items Work-items are grouped together into work-groups –Maximum # of work-items per work-group (HW limit) –Query CL_DEVICE_MAX_WORK_GROUP_SIZE in clDeviceInfo •Typically 256-1024 The entire collection of work-items is called the N- Dimensional Range (NDRange) 28
  • 29. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Work-Item Hierarchy Work-groups and NDRange can be 1D, 2D, or 3D Dimensions set at launch time 29 Work-Item (3,0) Work-Item (1,0) Work-Item (2,0) Work-Item (0,0) Work-Item (3,1) Work-Item (1,1) Work-Item (2,1) Work-Item (0,1) Work-Item (3,2) Work-Item (1,2) Work-Item (2,2) Work-Item (0,2) Work-Group (1,1) Work-Group (0,0) Work-Group (1,0) Work-Group (2,0) Work-Group (0,1) Work-Group (1,1) Work-Group (2,1) ND Range
  • 30. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs The OpenCL Programming Model The host launches kernels The host executes serial code between device kernel launches –Memory management –Data exchange to/from device (usually) –Error handling 30 Work-Group (0,0) Work-Group (1,0) Work-Group (0,1) Work-Group (1,1) Work-Group (0,2) Work-Group( 1,2) ND Range Work-Group (0,0) Work-Group (1,0) Work-Group (2,0) Work-Group (0,1) Work-Group (1,1) Work-Group (2,1) ND Range Host Device Host Device
  • 31. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Data-Parallel Computing on GPUs Data-parallel computing maps well to GPUs: –Identical operations executed on many data elements in parallel •Simplified flow control allows increased ratio of compute logic (ALUs) to control logic 31 DRAM GPU DRAM CPU ALU Control L1 Cache L2 Cache ALU ALU ALU ALU Control L1 Cache L2 Cache ALU ALU ALU L3 Cache
  • 32. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL API – Launching a Kernel C How to launch a kernel: 32 //3D Work-Group, let OpenCL Runtime determine //local work size. size_t const globalWorkSize[3] = {512,512,512}; clEnqueueNDRangeKernel(queue, kernel, 3, NULL, globalWorkSize, NULL, 0, NULL, NULL); //2D Work-Group, specify local work size size_t const globalWorkSize[2] = {512,512}; size_t const localWorkSize[2] = {16, 16}; clEnqueueNDRangeKernel(queue, kernel, 2, NULL, globalWorkSize, localWorkSize, 0, NULL, NULL);
  • 33. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs GPU Kernels
  • 34. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Writing OpenCL Kernels Denoted by __kernel function qualifier –Eg. __kernel void myKernel(__global float* a) Queued from host, executed on device A few noteworthy restrictions: –No access to host memory (in general!) –Must return void –No function pointers –No static variables –No recursion (no stack) 34
  • 35. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax - Kernels Kernels have built-in functions: –The variable dim ranges from 0 to 2, depending on the dimension of the kernel launch –get_work_dim (): number of dimensions in use –get_global_id (dim): unique index of a work-item –get_global_size (dim): number of global work-items 35
  • 36. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax – Kernels (Continued) Built-in function listing (continued): –get_local_id (dim): unique index of the work-item within the work-group –get_local_size (dim): number of work-items within the work-group –get_group_id (dim): index of the work-group –get_num_groups (dim): number of work-groups –Cannot vary the size of work-groups or work-items during a kernel call 36
  • 37. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax - Kernels Built-in functions are typically used to determine unique work-item identifiers: 37 get_group_id(0) get_local_size(0) = 5 get_global_id(0) ND Range 0 0 1 2 3 4 1 0 1 2 3 4 2 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 get_local_id(0) One Dimensional Array (get_work_dim () == 1) get_global_id(0) == get_group_id(0) * get_local_size(0) + get_local_id(0)
  • 38. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax – Thread Identifiers Result for each kernel launched with the following execution configuration: Dimension = 1 Global work size = 12 Local Work Size = 4 38 __kernel void MyKernel(__global int* a) { int idx = get_global_id(0); a[idx] = 7; } __kernel void MyKernel(__global int* a) { int idx = get_global_id(0); a[idx] = get_group_id(0); } __kernel void MyKernel(__global int* a) { int idx = get_global_id(0); a[idx] = get_local_id(0); } a: 7 7 7 7 7 7 7 7 7 7 7 7 a: 0 0 0 0 1 1 1 1 2 2 2 2 a: 0 1 2 3 0 1 2 3 0 1 2 3
  • 39. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Code Example - Kernel Kernel is executed by N work-items –Each work-item has a unique ID between 0 and N-1 39 void inc(float* a, float b, int N) { for(int i = 0; i<N; i++) a[i] = a[i] + b; } void main() { … increment(a,b,N); } __kernel void inc(__global float* a, float b) { int i = get_global_id(0); a[i] = a[i] + b; } void main() { … clEnqueueNDRangeKernel(…,…); }
  • 40. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax - Kernels All C operators are supported –eg. +, *, /, ^, >, >> Many functions from the standard math library –eg. sin(), cos(), ceil(), fabs() Can write/call your own non-kernel functions –float myDeviceFunction(__global float *a) –Non-kernel functions cannot be called by host Control flow statements too! –eg. if(), while(), for() 40
  • 41. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Syntax - Synchronization Kernel launches are asynchronous –Control returns to CPU immediately –Subsequent commands added to the command queue will wait until the kernel has completed –If you want to synchronize on the host: •Implicit synchronization via blocking commands –eg. clEnqueueReadBuffer() with the blocking argument set to CL_TRUE –Explicitly call clFinish() clFinish(queue) –Blocks on host until all outstanding OpenCL commands are complete in a given queue 41
  • 42. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Questions? OpenCL training courses and consulting services Acceleware Ltd. Twitter: @Acceleware Web: http://acceleware.com/opencl-training Email: services@acceleware.com ------------------- Stay in the know about developer news, tools, SDKs, technical presentations, events and future webinars. Connect with AMD Developer Central here: AMD Developer Central Twitter: @AMDDevCentral Web: http://developer.amd.com/ YouTube: https://www.youtube.com/user/AMDDevCentral Developer Forums: http://devgurus.amd.com/welcome 42
  • 43. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs An Overview of GPU Hardware
  • 44. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs What is the GPU? The GPU is a graphics processing unit Historically used to offload graphics computations from the CPU Can either be a dedicated video card, integrated on the motherboard or on the same die as the CPU –Highest performance will require a dedicated video card 44
  • 45. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Why use GPUs? Performance! 45 Intel Xeon E5-2697 v2 (Ivy Bridge) AMD Opteron 6386SE (Bulldozer) AMD FirePro W9100 (Volcanic Islands) AMD FirePro S10000 (Southern Islands) Processing Cores 12 16 2816 3584 Clock Frequency (GHz) 2.7-3.4* GHz 2.8-3.5* GHz 930 MHz 825 MHz Memory Bandwidth 59.7 GB/s / socket 59.7 GB/s / socket 320 GB/s 480 GB/s Peak Gflops** (single) 576 @ 3.0GHz 410 @ 3.2GHz 5240 5910 Peak Gflops** (double) 288 @ 3.0GHz 205 @ 3.2GHz 2620 1480 Gflops/Watt (single) 4.4 2.9 19 15.76 Total Memory >>16GB >>16GB 16 GB 6 GB *Indicates range of clock frequencies supported via Intel Turbo Boost and AMD Turbo CORE Technology ** At maximum frequency when all cores are executing
  • 46. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs GPU Potential Advantages  9x more single-precision floating-point throughput  9x more double-precision floating-point throughput  5x higher memory bandwidth 46 AMD FirePro W9100 vs. Xeon E5-2697 v2
  • 47. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs GPU Disadvantages Architecture not as flexible as CPU Must rewrite algorithms and maintain software in GPU languages Attached to CPU via relatively slow PCIe –16GB/s bi-directional for PCIe 3.0 16x Limited memory (though 6-16GB is reasonable for many applications) 47
  • 48. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs Software Approaches for Acceleration Maximum Flexibility –OpenCL Simple programming for heterogeneous systems –Simple compiler hints/pragmas –Compiler parallelizes code –Target a variety of platforms “Drop-in” Acceleration –In-depth GPU knowledge not required –Highly optimized by GPU experts –Provides functions used in a broad range of applications (eg. FFT, BLAS) 48 Programming Languages OpenACC Directives Libraries Effort
  • 49. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs An Introduction to OpenCL
  • 50. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Overview Parallel computing architecture standardized by the Khronos Group OpenCL: –Is a royalty free standard –Provides an API to coordinate parallel computation across heterogeneous processors –Defines a cross-platform programming language 50
  • 51. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Versions To date there are four different versions of OpenCL –OpenCL 1.0 –OpenCL 1.1 –OpenCL 1.2 –OpenCL 2.0 (finalized November 2013) Different versions support different functionality 51 Hardware Vendor Supported OpenCL Version AMD OpenCL 1.2 Intel OpenCL 1.2 NVIDIA OpenCL 1.1
  • 52. © 2014 Acceleware Ltd. Reproduction or distribution strictly prohibited. An Introduction to OpenCL Using AMD GPUs OpenCL Extensions Optional functionality is exposed through extensions –Vendors are not required to support extensions to achieve conformance –However, extensions are expected to be widely available Some OpenCL extensions are approved by the OpenCL working group –These extensions are likely to be promoted to core functionality in future versions of the standard Multi-vendor and vendor specific extensions do not need approval by the working group 52