SlideShare une entreprise Scribd logo
1  sur  26
HETEROGENEOUS MATH LIBRARIES
KENT KNOX
12/16/2014
2 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
AGENDA
clMATH
‒clBLAS
‒clFFT
ACML
clMAGMA
Bolt
LIBRARIES COVERED
A survey of available libraries
3 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLMATHLIBRARIES
clMathLibraries is a github organization for OpenCL™
math related subprojects
https://github.com/clMathLibraries
Currently hosting two subprojects: clBLAS & clFFT
Open Source
clBLAS
5 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS
 clBLAS implements the NetLib BLAS functionality with OpenCL
‒ Level 3 – Matrix x Matrix operations, O( N^3 ), compute bound
‒ Level 2 – Matrix x Vector operations, O( N^2 ), mostly memory bound
‒ Level 1 – Vector x Vector operations, O( N ), memory bound
 The API is in the same style as NetLib, but appends OpenCL structures
‒ clblasStatus clblasSgemm( clblasOrder order, clblasTranspose transA,
clblasTranspose transB, size_t M, size_t N, size_t K, cl_float alpha, const
cl_mem A, size_t offA, size_t lda, const cl_mem B, size_t offB, size_t ldb,
cl_float beta, cl_mem C, size_t offC, size_t ldc, cl_uint numCommandQueues,
cl_command_queue* commandQueues, cl_uint numEventsInWaitList, const cl_event*
eventWaitList, cl_event* events )
 clBLAS assumes that the user is comfortable with OpenCL programming
‒ The host code is responsible for detecting /choosing devices, transferring memory and synchronizing
operations
API
6 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS
 A proof of concept Python wrapper for clBLAS started, but only sgemm wrapped
‒https://github.com/clMathLibraries/clBLAS/tree/master/src/wrappers/python
‒Based on Cython
‒Works with PyOpenCL to manage OpenCL state
‒Would love help from the community to finish this
 The community wrote a Julia wrapper for clBLAS
‒https://github.com/JuliaGPU/CLBLAS.jl
API
7 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS
• The user is responsible for running the tool on their machine
as a preprocessing step
• The tool creates a kernel database file (.kdb) that contains the best
performing kernel for a given BLAS routine
• The .kdb file is specific to an OpenCL device; will be named after
that device; e.g. tahiti.kdb
• Example
• export CLBLAS_STORAGE_PATH = /usr/local/lib
• ./tune --gemm --double
clBLAS contains a Tune tool for finding
better OpenCL kernels
Open Source
clFFT
9 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT
 clFFT implements an FFTW inspired interface with OpenCL
‒ Provides a fast and accurate platform for calculating discrete FFTs
‒ Supports 1D, 2D, and 3D transforms with a batch size that can be greater than 1
‒ Supports dimension lengths that can be any mix of powers of 2, 3, and 5
‒ Supports single and double precision floating point formats
 clFFT assumes that the user is comfortable with OpenCL programming
‒ The host code is responsible for detecting/choosing devices, transferring memory and synchronizing
operations
 The community wrote a Python wrapper for clFFT
‒https://github.com/geggo/gpyfft
 The community wrote a Julia wrapper for clFFT
‒https://github.com/JuliaGPU/CLFFT.jl
API
10 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT
• Users set all FFT state in an FFT plan object when initializing
• Call ‘BakePlan’ using the plan object to tell the library to JIT and
compile the kernel outside of performance sensitive loops
• Reuse those plans as much as possible!
clFFT contains the concept of ‘plans’,
which allows the library to tune OpenCL
kernels at runtime
11 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT
PERFORMANCE
 clFFT v2.3.1 included in ACML
v6.1
 This version contains
optimizations not yet pushed
into public github repo
 You can use the clFFT.h header
file from GitHub to compile
your application, then use the
binary from ACML
 Benchmark system
 64bit Linux
 FirePro W9100
 Catalyst Pro
14.301.1010
 AMD A10-7850K
ACML 6
13 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
ACML 6 INTRODUCES HETEROGENEOUS COMPUTE
 OpenCL can be a difficult language to learn
‒ There exists legacy applications that won’t be ported to OpenCL
‒ They might be willing to sacrifice peak performance for program
portability
 ACML 6 includes clBLAS & clFFT as new backends
‒ ACML hides all OpenCL programming from end users
‒ Client programs do not need to change at all; they only relink ACML 6
 When ACML determines that a particular BLAS or FFT call will
gain benefit from offloading computation, it will do so without
knowledge of the client program
LEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCL
ACML 6 keeps the same API!
14 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
NEW FFTW WRAPPER
ACML 6 now ships with fftw.h
FFTW programs could link with ACML 6 to offload
computation onto OpenCL devices
No changes in host code required!
15 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
ACMLSCRIPT
• The scripting language uses Lua, with custom ACML callback
functions
• http://www.lua.org/
• Refer to chapter 7 of the ACML documentation for more
information on how to modify or create your own scripts
ACML includes a new scripting
language that expresses the logic
ACML uses to offload computation
16 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
ACMLSCRIPT: 3-PART VIDEO TUTORIALS
ACMLScript: Part 1
ACMLScript: Part 2
ACMLScript: Part 3
HTTPS://WWW.YOUTUBE.COM/USER/AMDDEVCENTRAL
17 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
ACML- HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT
PERFORMANCE
 ACML v6.0 sgemm
 Slightly old at this time
 Notice that the green line is
equivalent to Max( blue, red )
 ACML loads the host
processor if the problem
is too small to benefit
from GPU acceleration
 Benchmark system
 AMD A10-7850K
 CPU & GPU
 64bit Linux
 Catalyst 14.301.1001
Open Source
clMagma
19 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLMAGMA
clMAGMA implements LAPACK functionality with
OpenCL acceleration
https://bitbucket.org/icl/clmagma
Maintained by the University of Tennessee Knoxville
20 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
CLMAGMA
 The newest v1.3 Supports
‒ LU, QR and Cholesky factorizations
‒ Linear and least squares solvers
‒ Reductions to Hessenberg, bidiagonal and tridiagonal forms
‒ Eigen and singular value problem solvers
‒ Orthogonal transformation routines
 clMagma uses clBLAS as the GPU compute backend
‒ It currently provides static load balancing between CPU & GPU cores
 Multi-GPU support
LEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCL
v1.3 adds support for Windows and
Mac OSX
Open Source
Bolt
22 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
BOLT
Bolt implements parallel C++ STL functionality with
AMP & OpenCL acceleration
Bolt on GitHub
Maintained by AMD
23 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
BOLT
 Bolt provides containers and algorithms that enable clients to
accelerate C++ code with minimal GPU knowledge
‒ Sorts
‒ Reductions
‒ Transforms
‒ Scans
 Through control structures, clients control where data is
allocated and computed (minimal knowledge of AMP or OpenCL
is helpful here)
 Bolt provides support for both OpenCL & C++ AMP paths
PARALLEL STL
Bolt provides containers such as
bolt::device_vector<>
24 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
BOLT
#include <bolt/cl/device_vector.h>
#include <bolt/cl/scan.h>
#include <vector>
#include <numeric>
int main()
{
size_t length = 1024;
// Create device_vector and initialize it to 1
bolt::cl::device_vector< int > boltInput( length, 1 );
// Calculate the inclusive_scan of the device_vector
bolt::cl::inclusive_scan(boltInput.begin(),boltInput.end(),boltInput.begin( ) );
// Create an std vector and initialize it to 1
std::vector< int > stdInput( length, 1 );
// Calculate the inclusive_scan of the std vector
bolt::cl::inclusive_scan(stdInput.begin( ),stdInput.end( ),stdInput.begin( ) );
return 0;
}
EXAMPLE CODE
25 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
Q&A & CONTACT INFO
For More Info:
Follow us on Twitter: @AMDDevCentral
Visit our forums: http://devgurus.amd.com/welcome
Visit our website: www.developer.amd.com
Watch the replay: www.youtube.com/user/AMDDevCentral
Download the presentation: www.slideshare.net/DevCentralAMD
26 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

Contenu connexe

Tendances

PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbr Skip
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...AMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansAMD Developer Central
 

Tendances (20)

PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 

En vedette

GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12mistercteam
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...AMD Developer Central
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101Yoss Cohen
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Tomasz Bednarz
 
Reactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the BeachReactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the BeachRoland Kuhn
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 

En vedette (19)

Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
 
OpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel ComputingOpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel Computing
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Reactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the BeachReactive Design Patterns — J on the Beach
Reactive Design Patterns — J on the Beach
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 

Similaire à Leverage the Speed of OpenCL™ with AMD Math Libraries

HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_finalYutaka Kawai
 
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...OpenStack
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...DataStax Academy
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
And Then There Were None
And Then There Were NoneAnd Then There Were None
And Then There Were NoneSimar Neasy
 
AI LAB using IBM Power 9 Processor
AI LAB using IBM Power 9 ProcessorAI LAB using IBM Power 9 Processor
AI LAB using IBM Power 9 ProcessorGanesan Narayanasamy
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS
 
OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...
OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...
OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...OpenNebula Project
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...OpenShift Origin
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksKenta Oono
 

Similaire à Leverage the Speed of OpenCL™ with AMD Math Libraries (20)

HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
 
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
And Then There Were None
And Then There Were NoneAnd Then There Were None
And Then There Were None
 
AI LAB using IBM Power 9 Processor
AI LAB using IBM Power 9 ProcessorAI LAB using IBM Power 9 Processor
AI LAB using IBM Power 9 Processor
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...
OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...
OpenNebulaConf 2014 - Using Ceph to provide scalable storage for OpenNebula -...
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 

Plus de AMD Developer Central

RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...AMD Developer Central
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...AMD Developer Central
 

Plus de AMD Developer Central (9)

Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
 

Dernier

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Leverage the Speed of OpenCL™ with AMD Math Libraries

  • 2. 2 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 AGENDA clMATH ‒clBLAS ‒clFFT ACML clMAGMA Bolt LIBRARIES COVERED A survey of available libraries
  • 3. 3 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLMATHLIBRARIES clMathLibraries is a github organization for OpenCL™ math related subprojects https://github.com/clMathLibraries Currently hosting two subprojects: clBLAS & clFFT
  • 5. 5 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS  clBLAS implements the NetLib BLAS functionality with OpenCL ‒ Level 3 – Matrix x Matrix operations, O( N^3 ), compute bound ‒ Level 2 – Matrix x Vector operations, O( N^2 ), mostly memory bound ‒ Level 1 – Vector x Vector operations, O( N ), memory bound  The API is in the same style as NetLib, but appends OpenCL structures ‒ clblasStatus clblasSgemm( clblasOrder order, clblasTranspose transA, clblasTranspose transB, size_t M, size_t N, size_t K, cl_float alpha, const cl_mem A, size_t offA, size_t lda, const cl_mem B, size_t offB, size_t ldb, cl_float beta, cl_mem C, size_t offC, size_t ldc, cl_uint numCommandQueues, cl_command_queue* commandQueues, cl_uint numEventsInWaitList, const cl_event* eventWaitList, cl_event* events )  clBLAS assumes that the user is comfortable with OpenCL programming ‒ The host code is responsible for detecting /choosing devices, transferring memory and synchronizing operations API
  • 6. 6 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS  A proof of concept Python wrapper for clBLAS started, but only sgemm wrapped ‒https://github.com/clMathLibraries/clBLAS/tree/master/src/wrappers/python ‒Based on Cython ‒Works with PyOpenCL to manage OpenCL state ‒Would love help from the community to finish this  The community wrote a Julia wrapper for clBLAS ‒https://github.com/JuliaGPU/CLBLAS.jl API
  • 7. 7 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS • The user is responsible for running the tool on their machine as a preprocessing step • The tool creates a kernel database file (.kdb) that contains the best performing kernel for a given BLAS routine • The .kdb file is specific to an OpenCL device; will be named after that device; e.g. tahiti.kdb • Example • export CLBLAS_STORAGE_PATH = /usr/local/lib • ./tune --gemm --double clBLAS contains a Tune tool for finding better OpenCL kernels
  • 9. 9 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT  clFFT implements an FFTW inspired interface with OpenCL ‒ Provides a fast and accurate platform for calculating discrete FFTs ‒ Supports 1D, 2D, and 3D transforms with a batch size that can be greater than 1 ‒ Supports dimension lengths that can be any mix of powers of 2, 3, and 5 ‒ Supports single and double precision floating point formats  clFFT assumes that the user is comfortable with OpenCL programming ‒ The host code is responsible for detecting/choosing devices, transferring memory and synchronizing operations  The community wrote a Python wrapper for clFFT ‒https://github.com/geggo/gpyfft  The community wrote a Julia wrapper for clFFT ‒https://github.com/JuliaGPU/CLFFT.jl API
  • 10. 10 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT • Users set all FFT state in an FFT plan object when initializing • Call ‘BakePlan’ using the plan object to tell the library to JIT and compile the kernel outside of performance sensitive loops • Reuse those plans as much as possible! clFFT contains the concept of ‘plans’, which allows the library to tune OpenCL kernels at runtime
  • 11. 11 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT PERFORMANCE  clFFT v2.3.1 included in ACML v6.1  This version contains optimizations not yet pushed into public github repo  You can use the clFFT.h header file from GitHub to compile your application, then use the binary from ACML  Benchmark system  64bit Linux  FirePro W9100  Catalyst Pro 14.301.1010  AMD A10-7850K
  • 13. 13 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 ACML 6 INTRODUCES HETEROGENEOUS COMPUTE  OpenCL can be a difficult language to learn ‒ There exists legacy applications that won’t be ported to OpenCL ‒ They might be willing to sacrifice peak performance for program portability  ACML 6 includes clBLAS & clFFT as new backends ‒ ACML hides all OpenCL programming from end users ‒ Client programs do not need to change at all; they only relink ACML 6  When ACML determines that a particular BLAS or FFT call will gain benefit from offloading computation, it will do so without knowledge of the client program LEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCL ACML 6 keeps the same API!
  • 14. 14 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 NEW FFTW WRAPPER ACML 6 now ships with fftw.h FFTW programs could link with ACML 6 to offload computation onto OpenCL devices No changes in host code required!
  • 15. 15 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 ACMLSCRIPT • The scripting language uses Lua, with custom ACML callback functions • http://www.lua.org/ • Refer to chapter 7 of the ACML documentation for more information on how to modify or create your own scripts ACML includes a new scripting language that expresses the logic ACML uses to offload computation
  • 16. 16 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 ACMLSCRIPT: 3-PART VIDEO TUTORIALS ACMLScript: Part 1 ACMLScript: Part 2 ACMLScript: Part 3 HTTPS://WWW.YOUTUBE.COM/USER/AMDDEVCENTRAL
  • 17. 17 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 ACML- HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT PERFORMANCE  ACML v6.0 sgemm  Slightly old at this time  Notice that the green line is equivalent to Max( blue, red )  ACML loads the host processor if the problem is too small to benefit from GPU acceleration  Benchmark system  AMD A10-7850K  CPU & GPU  64bit Linux  Catalyst 14.301.1001
  • 19. 19 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLMAGMA clMAGMA implements LAPACK functionality with OpenCL acceleration https://bitbucket.org/icl/clmagma Maintained by the University of Tennessee Knoxville
  • 20. 20 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 CLMAGMA  The newest v1.3 Supports ‒ LU, QR and Cholesky factorizations ‒ Linear and least squares solvers ‒ Reductions to Hessenberg, bidiagonal and tridiagonal forms ‒ Eigen and singular value problem solvers ‒ Orthogonal transformation routines  clMagma uses clBLAS as the GPU compute backend ‒ It currently provides static load balancing between CPU & GPU cores  Multi-GPU support LEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCL v1.3 adds support for Windows and Mac OSX
  • 22. 22 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 BOLT Bolt implements parallel C++ STL functionality with AMP & OpenCL acceleration Bolt on GitHub Maintained by AMD
  • 23. 23 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 BOLT  Bolt provides containers and algorithms that enable clients to accelerate C++ code with minimal GPU knowledge ‒ Sorts ‒ Reductions ‒ Transforms ‒ Scans  Through control structures, clients control where data is allocated and computed (minimal knowledge of AMP or OpenCL is helpful here)  Bolt provides support for both OpenCL & C++ AMP paths PARALLEL STL Bolt provides containers such as bolt::device_vector<>
  • 24. 24 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 BOLT #include <bolt/cl/device_vector.h> #include <bolt/cl/scan.h> #include <vector> #include <numeric> int main() { size_t length = 1024; // Create device_vector and initialize it to 1 bolt::cl::device_vector< int > boltInput( length, 1 ); // Calculate the inclusive_scan of the device_vector bolt::cl::inclusive_scan(boltInput.begin(),boltInput.end(),boltInput.begin( ) ); // Create an std vector and initialize it to 1 std::vector< int > stdInput( length, 1 ); // Calculate the inclusive_scan of the std vector bolt::cl::inclusive_scan(stdInput.begin( ),stdInput.end( ),stdInput.begin( ) ); return 0; } EXAMPLE CODE
  • 25. 25 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 Q&A & CONTACT INFO For More Info: Follow us on Twitter: @AMDDevCentral Visit our forums: http://devgurus.amd.com/welcome Visit our website: www.developer.amd.com Watch the replay: www.youtube.com/user/AMDDevCentral Download the presentation: www.slideshare.net/DevCentralAMD
  • 26. 26 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014 DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.