SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
04/01/14 1
Establishing a CUDA Research Center at
Penn State: Perspectives on GPU-Enabled
Teaching and Research
William J. Brouwer (wjb19@psu.edu)
Pierre-Yves Taunay (py.taunay@psu.edu)
Research Computing and Cyberinfrastructure
The Pennsylvania State University
Nvidia GTC 2014
04/01/14 2
Outline
● Center Overview (RCC @ PSU)
● GPU accelerated research
● IceCube
● Metabolic Networks (Fsolve/cuSolve)
● MD + Simulated Annealing
● FQHE (LU Decomposition)
● Smart Proppants (QR Decomposition)
● GPU cluster scaling
● Amber
● PetaChem
● Quantum Espresso
– Lanczos Diagonalization
● CUDA, needs + wants
● Summary
Nvidia GTC 2014
04/01/14 3
Center Overview
● Research Computing and Cyberinfrastructure (RCC) at PSU
provides high performance computing services :
● Hardware, proprietary/open source software
● Consultation (numerical/algorithmic, software development etc)
● PhD's, system admins and programmers work together to provide
these services to academics while performing independent
research
● Many users are interested in using GPUs for science and engineering
research applications, we are a CUDA research center
https://research.nvidia.com/content/penn-state-crc-summary
● Formerly under ITS, currently incorporating into Office of the Vice
President for Research (OVPR)
Nvidia GTC 2014
04/01/14 4
Center Overview
● Hardware is ~ 12K CPU cores, 64 GPUs (Fermi), several Kepler
● Red Hat Linux, scheduling via PBS/Moab/Torque
● Usual monitoring/management tools eg., Puppet, Jenkins, Nagios,
Ganglia, and some custom solution(s) ( eg., CLPR)
● Serve ~ 7k users, all campuses in the commonwealth
● Use CUDA predominantly, although growing numbers of users trying
OpenACC, OpenCL, libraries etc
● Environment modules system
Nvidia GTC 2014
04/01/14 5
Center Overview
● Support many GPU accelerated applications
Nvidia GTC 2014
04/01/14 6
Outline
● Center Overview (RCC @ PSU)
● GPU accelerated research
● IceCube
● Metabolic Networks (Fsolve/cuSolve)
● MD + Simulated Annealing
● FQHE (LU Decomposition)
● Smart Proppants (QR Decomposition)
● GPU cluster scaling
● Amber
● PetaChem
● Quantum Espresso
– Lanczos Diagonalization
● CUDA, needs + wants
● Summary
Nvidia GTC 2014
04/01/14 7
Nvidia GTC 2014
IceCube
04/01/14 8
Metabolic Networks
● Optimal models for the metabolic networks of microbial organisms
important in pharma, energy industries
● Ensemble Modeling (EM) is used to construct chemical kinetics of
microbial organisms → decompose metabolic reactions into the
elementary mechanisms, which are ODE systems f(ki
,yj
) = dyj
/dt
Nvidia GTC 2014
● Overall approach
maximizes correlation
between model
predictions and
experimental
measurements,
performed in steady state
→ solve f(k,y) = 0
04/01/14 9
Metabolic Networks
● [CPU] parse equations f(k,y)
● [CPU] differentiate f(k,y), create analytic J(k,y)
● [CPU] populate data structures representing f(k,y), J(k,y),
copy to GPU
● [GPU] Iterate (Newton-Raphson) →
● Numerically evaluate f(k,y) and J(k,y) by parallel
reduction
● Solve for delta in f(k,y) = -delta . J(k,y) using GMRES
●
Update y += delta and repeat until ||f(k,y)|| < tol
Nvidia GTC 2014
04/01/14 10
Metabolic Networks
Nvidia GTC 2014
● Solution uses various libraries
including Boost, Thrust, CUSP and
CUDA
● Matrices sparse, poorly conditioned,
but solution works well for O(10^2)
equations
● Currently working to scale to larger,
more interesting networks and
microbial organisms
● CuSolve is a work in progress, a
GPU-only ODE solve for stiff
equations
04/01/14 11
Molecular Dynamics + Sim Anneal
Nvidia GTC 2014
● Solve for MD potentials by fitting experimental data for structure factor
● Optimization surface (below) is highly non-convex → use simulated
annealing, each GPU performs independent MD run
04/01/14 12
LU Decomposition
Nvidia GTC 2014
● Batch LU decomposition developed for fractional quantum Hall effect,
fundamental physics that has implications in quantum computation and
material science
● O(N!) determinants need to be evaluated in constructing wavefunction,
process repeated many times in Monte Carlo calculation
● Small, dense matrices of side <= 512
● Implementation exploits SIMD architecture, parallel reduction
● Example; N=11, computation time using 8 GPU devices (w/ MPI), 1024
Monte Carlo iterations is ~ 246 seconds from ~ 31488 single CPU
04/01/14 13
LU Decomposition
Nvidia GTC 2014
04/01/14 14
QR Decomposition
Nvidia GTC 2014
● Proppant materials used to stabilize fissures created during hydraulic
fracturing
● 'Smart proppants' are essentially electrical dipoles which may absorb
and re-emit EM energy, irradiated and recorded by downhole
instrumentation
● This work considers an iteration-free solution to this EM scattering
problem, uses linear algebra including LU and SVD decomposition
● SVD can be performed using the QR algorithm, in turn a function of QR
decomposition
● Devised a unique approach for large batches of dense small matrices
using Givens rotations; largely independent ops, maps well to GPU
04/01/14 15
QR Decomposition
Nvidia GTC 2014
04/01/14 16
Outline
● Center Overview (RCC @ PSU)
● GPU accelerated research
● IceCube
● Metabolic Networks (Fsolve/cuSolve)
● MD + Simulated Annealing
● FQHE (LU Decomposition)
● Smart Proppants (QR Decomposition)
● GPU cluster scaling
● Amber
● PetaChem
● Quantum Espresso
– Lanczos Diagonalization
● CUDA, needs + wants
● Summary
Nvidia GTC 2014
04/01/14 17
GPU Cluster Scaling
Nvidia GTC 2014
● Several key GPU accelerated software suites were tested using
multiple GPUs across two clusters
Cluster Lion-GA Stampede
CPU 12 X5675 @ 3.07 GHz 16 E5-2680 @ 2.70 GHz
GPU 8 M2070 or 8 M2090 1 K20c
Nodes equipped with
GPUs
8 120
Interconnect
40 Gb/s Mellanox
QDR Infiniband
56 Gb/s Mellanox
FDR Infiniband
04/01/14 18
GPU Cluster Scaling
Nvidia GTC 2014
● Lion-GA cluster has 3 GPUs per PCIe
switch, 3 to 5 GPUs per IOH chip
● IOH doesn't support peer to peer
transfers between GPU devices on
different chipsets
● Difficult to achieve peak transfer rates
across GPU on different sockets
04/01/14 19
Amber
Nvidia GTC 2014
● Molecular Dynamics is widely used for simulation of solvated proteins
or molecules and make use of various force fields (AMBER, ReaxFF,
etc.)
● AMBER force field is implemented in the eponymous software suite
● The software PMEMD in AMBER is used for both explicit solvent
Particle Mesh Ewald (PME) and implicit solvent General Borne (GB)
simulations
● AMBER does not require extensive communication between GPUs or
between CPU and GPU, and does not take advantage of the CPU if
GPUs are used
● GPU acceleration allows for longer simulation times ~ nanosecond or
more
04/01/14 20
Nvidia GTC 2014
12 X5675 2 M2090 4 M2090 6 M2090 8 M2090
01020304050607080
PME simulation of DHFR protein in water
(NPT ensemble, 23,558 atoms)
Achieved performance on Lion-GA
ns/day
Amber
04/01/14 21
Nvidia GTC 2014
12 X5675 2 M2090 4 M2090 6 M2090 8 M2090
024681012141618
PME simulation of FactorIX molecule in water
(NPT ensemble, 90,906 atoms)
Achieved performance on Lion-GA
ns/day
Amber
04/01/14 22
Nvidia GTC 2014
12 X5675 2 M2090 4 M2090 6 M2090 8 M2090
00.511.522.533.544.5
PME simulation of Cellulose molecule in water
(NPT ensemble, 408,609 atoms)
Achieved performance on Lion-GA
ns/day
Amber
04/01/14 23
Nvidia GTC 2014
12 X5675 2 M2090 4 M2090 6 M2090 8 M2090
050100150200
Implicit solvent GB simulation of Myoglobin
(2,492 atoms)
Achieved performance on Lion-GA
ns/day
Amber
04/01/14 24
Nvidia GTC 2014
12 X5675 2 M2090 4 M2090 6 M2090 8 M2090
01234567
Implicit solvent GB simulation of Nucleosome
(25,095 atoms)
Achieved performance on Lion-GA
ns/day
Amber
04/01/14 25
PetaChem
Nvidia GTC 2014
● Quantum Chemistry designed to run on NVIDIA series hardware
● Features restricted Hartree-Fock and grid-based Kohn-Sham single
point energy and gradient calculations
● Various functions supported, geometry optimization, ab-initio molecular
dynamics, support for multi-GPU
● Benchmark: single point energy, using basis 6-31g for Olestra
04/01/14 26
PetaChem
Nvidia GTC 2014
1 M2070 3 M2070 5 M2070 7 M2070
0100200300400500600
PetaChem Olestra SCF calculation
Total walltime (in s) on Lion-GA
Walltime(s)
04/01/14 27
Quantum Espresso
Nvidia GTC 2014
● Density Functional Theory (DFT) has enjoyed huge growth in
popularity owing to computational and numerical advancements; used
widely in material science
● Quantum Espresso (QE) is an open source DFT package that has
recently added GPU acceleration, largely through BLAS and FFT
routines
● When building QE with MAGMA (UT/ORNL) or phiGEMM, one
introduces heterogeneous CPU/GPU linear algebra routines
● Benchmark:
● Self-consistent field calculation, using PBE pseudopotentials,168
atoms (cellulose)
● Periodic boundary conditions, kinetic energy cutoff (Ry) for charge
density of 80 Ry, Davidson diagonalization
04/01/14 28
Nvidia GTC 2014
1 K20 2 K20 4 K20 8 K20 16 K20 32 K20
01234567
SCF calculation for cellulose
Total walltime (in hrs) on Stampede@TACC
Walltime(hrs)
Quantum Espresso
04/01/14 29
Lanczos Diagonalization
Nvidia GTC 2014
● Key task in many applications, esp quantum chemistry & DFT is
diagonalization ie., matrix eigen-decomposition
● Lanczos is a power method, produces a tri-diagonal matrix, more
readily solvable; consists of many matrix-vector operations, very
amenable to GPU, currently using cuBLAS &MKL in a heterogeneous
solution.
● Originally devised for fundamental physics project at PSU, now
intended for incorporation into GPU-Quantum Espresso project being
led by Filippo Spiga
● Attempting to scale to multiple devices using MPI + GPUdirect, still
beset by some numerical/convergence problems with increasing matrix
size
04/01/14 30
Lanczos Diagonalization
Nvidia GTC 2014
04/01/14 31
Lanczos Diagonalization
Nvidia GTC 2014
● CUDA 5.5/Kepler overall yields pleasing communication results (CUDA-
enabled openmpi 1.7.3, MPI send/recv), collectives less impressive
● Bandwidths for one-sided comms have some message size dependency
&jitter, but effective bandwidth much improved over previous gens.
1e+07
2 4 6 8
5
4
3
2
BandwidthGB/s
Increasing msg size in MB, within single application
● Results of 4 tests
● Rhel 6, Intel x86_64, Nvidia
driver 331.38
● Communication btwn K20 & K40
04/01/14 32
Outline
● Center Overview (RCC @ PSU)
● GPU accelerated research
● IceCube
● Metabolic Networks (Fsolve/cuSolve)
● MD + Simulated Annealing
● FQHE (LU Decomposition)
● Smart Proppants (QR Decomposition)
● GPU cluster scaling
● Amber
● PetaChem
● Quantum Espresso
– Lanczos Diagonalization
● CUDA, needs + wants
● Summary
Nvidia GTC 2014
04/01/14 33
CUDA needs + wants
Nvidia GTC 2014
● ODE and Function Solver(s), metabolic networks, chemically reactive
flows w/ OpenFOAM
→ support for more C++11 language features?
● Lanczos Diagonalization, DFT/quantum chemistry, incorporation into
Quantum Espresso
→ further improvements to GPUdirect (or use new multi-GPU
interfaces instead)?
● Batch LU/QR
→ increased warp size?
04/01/14 34
Summary
Nvidia GTC 2014
● Early adopters astrophysics, quantum chem/condensed matter still
active, see most growth in strands of computational biology/life
science, 'big data'
● Teaching seminars generally well received/attended, but...
● Most success from working to identify users/codes that can benefit
from GPU by monitoring clusters, and on a related note...
● The harvest is plentiful in academia but the workers are few; generally
if a code 'works' little pressure to make it better
● However changes even in traditional CPU architecture are forcing
workers to reevaluate their computational models (thanks Ken Esler for
this perspective); we live more and more in a parallel world
04/01/14 35
Acknowledgements
Nvidia GTC 2014
● Mark Berger, Chandra Cheij &Nvidia for generous donations
● {Ryan Eagen/Cowen group, Ali Khodayari/Maranas group, Sreejith
Jaya Ganesh, Jim Kubicki, Dan Haworth, Adri Van Duin} PSU
● {Chuck Gilbert, Jason Holmes} long-suffering sys admins
● HP for donation of 50 M2070
● XSEDE/TACC for Stampede cycles

Contenu connexe

Tendances

Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)PhtRaveller
 
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-indexGpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-indexMahesh Khadatare
 
Enhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUEnhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUMahesh Khadatare
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...NECST Lab @ Politecnico di Milano
 
Miniaturizing Space: Small-satellites
Miniaturizing Space: Small-satellitesMiniaturizing Space: Small-satellites
Miniaturizing Space: Small-satellitesX. Breogan COSTA
 
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐Preferred Networks
 
A Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionA Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionRyousei Takano
 
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...Koichi Shirahata
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPUIlya Kuzovkin
 
The Rise of Small Satellites
The Rise of Small SatellitesThe Rise of Small Satellites
The Rise of Small Satellitesmooctu9
 
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...Dawei Mu
 
A GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-VA GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-VKoichi Shirahata
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modelingnadikari123
 
System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks
System-wide Energy Optimization for Multiple DVS Components and Real-time TasksSystem-wide Energy Optimization for Multiple DVS Components and Real-time Tasks
System-wide Energy Optimization for Multiple DVS Components and Real-time TasksHeechul Yun
 
DOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor LayerDOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor Layerjeep82cj
 
PhD Thesis: Performance Modeling of Cloud Computing Centers
PhD Thesis: Performance Modeling of Cloud Computing CentersPhD Thesis: Performance Modeling of Cloud Computing Centers
PhD Thesis: Performance Modeling of Cloud Computing CentersYork University
 
Global space congress 2017 - German Orbital Systems Presentation
Global space congress 2017 - German Orbital Systems PresentationGlobal space congress 2017 - German Orbital Systems Presentation
Global space congress 2017 - German Orbital Systems PresentationIKosenkov
 

Tendances (20)

Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
 
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-indexGpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
 
Enhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUEnhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPU
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Miniaturizing Space: Small-satellites
Miniaturizing Space: Small-satellitesMiniaturizing Space: Small-satellites
Miniaturizing Space: Small-satellites
 
[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng
[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng
[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng
 
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
 
A Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionA Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity Consumption
 
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
 
The Rise of Small Satellites
The Rise of Small SatellitesThe Rise of Small Satellites
The Rise of Small Satellites
 
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
 
A GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-VA GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-V
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks
System-wide Energy Optimization for Multiple DVS Components and Real-time TasksSystem-wide Energy Optimization for Multiple DVS Components and Real-time Tasks
System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks
 
Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 
DOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor LayerDOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor Layer
 
PhD Thesis: Performance Modeling of Cloud Computing Centers
PhD Thesis: Performance Modeling of Cloud Computing CentersPhD Thesis: Performance Modeling of Cloud Computing Centers
PhD Thesis: Performance Modeling of Cloud Computing Centers
 
Global space congress 2017 - German Orbital Systems Presentation
Global space congress 2017 - German Orbital Systems PresentationGlobal space congress 2017 - German Orbital Systems Presentation
Global space congress 2017 - German Orbital Systems Presentation
 

En vedette

ブギーボードパンフレット
ブギーボードパンフレットブギーボードパンフレット
ブギーボードパンフレットBoogieBoard_JP
 
Program rada i financijski plan 2015.
Program rada i financijski plan 2015.Program rada i financijski plan 2015.
Program rada i financijski plan 2015.stipepetrina
 
சித்தர்கள் போற்றும் வாலை
சித்தர்கள் போற்றும் வாலை சித்தர்கள் போற்றும் வாலை
சித்தர்கள் போற்றும் வாலை Thanga Jothi Gnana sabai
 
Chrome-eject がこの先生きのこるには
Chrome-eject がこの先生きのこるにはChrome-eject がこの先生きのこるには
Chrome-eject がこの先生きのこるにはYosuke HASEGAWA
 
Advanced php
Advanced phpAdvanced php
Advanced phphamfu
 
AOA - Annual OMEL Conference Encourages Osteopathic Discourse
AOA - Annual OMEL Conference Encourages Osteopathic Discourse AOA - Annual OMEL Conference Encourages Osteopathic Discourse
AOA - Annual OMEL Conference Encourages Osteopathic Discourse Dr. Michael Thomas (Neurosurgeon)
 
Constitution of bangladesh
Constitution of bangladeshConstitution of bangladesh
Constitution of bangladeshMd Mominul Islam
 
Presentation 1112 for blog 2
Presentation 1112 for blog 2Presentation 1112 for blog 2
Presentation 1112 for blog 2katie_higson
 
iTec innovaatilised õpistsenaariumid
iTec innovaatilised õpistsenaariumidiTec innovaatilised õpistsenaariumid
iTec innovaatilised õpistsenaariumidMartin Sillaots
 
Estrategias de comunicación para el ciberactivismo
Estrategias de comunicación para el ciberactivismoEstrategias de comunicación para el ciberactivismo
Estrategias de comunicación para el ciberactivismoFreire Juan
 
[KGC 2013] Online Game Security in China
[KGC 2013] Online Game Security in China[KGC 2013] Online Game Security in China
[KGC 2013] Online Game Security in ChinaSeungmin Shin
 
新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙
新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙
新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙slawdan
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with REdureka!
 
Bahadur shah (son of king prithivinarayan)
Bahadur shah (son of king prithivinarayan)Bahadur shah (son of king prithivinarayan)
Bahadur shah (son of king prithivinarayan)Ramesh Pant
 
Oscars after - party
Oscars after - partyOscars after - party
Oscars after - partyMakala D.
 
गोष्टी सांगेन युक्तीच्या चार
गोष्टी सांगेन युक्तीच्या चारगोष्टी सांगेन युक्तीच्या चार
गोष्टी सांगेन युक्तीच्या चारShantanu Abhyankar
 
Međuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatoriMeđuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatoriabogosavljev
 

En vedette (20)

ブギーボードパンフレット
ブギーボードパンフレットブギーボードパンフレット
ブギーボードパンフレット
 
resumeh aali1
resumeh aali1resumeh aali1
resumeh aali1
 
Program rada i financijski plan 2015.
Program rada i financijski plan 2015.Program rada i financijski plan 2015.
Program rada i financijski plan 2015.
 
சித்தர்கள் போற்றும் வாலை
சித்தர்கள் போற்றும் வாலை சித்தர்கள் போற்றும் வாலை
சித்தர்கள் போற்றும் வாலை
 
Chrome-eject がこの先生きのこるには
Chrome-eject がこの先生きのこるにはChrome-eject がこの先生きのこるには
Chrome-eject がこの先生きのこるには
 
Advanced php
Advanced phpAdvanced php
Advanced php
 
AOA - Annual OMEL Conference Encourages Osteopathic Discourse
AOA - Annual OMEL Conference Encourages Osteopathic Discourse AOA - Annual OMEL Conference Encourages Osteopathic Discourse
AOA - Annual OMEL Conference Encourages Osteopathic Discourse
 
Constitution of bangladesh
Constitution of bangladeshConstitution of bangladesh
Constitution of bangladesh
 
Presentation 1112 for blog 2
Presentation 1112 for blog 2Presentation 1112 for blog 2
Presentation 1112 for blog 2
 
iTec innovaatilised õpistsenaariumid
iTec innovaatilised õpistsenaariumidiTec innovaatilised õpistsenaariumid
iTec innovaatilised õpistsenaariumid
 
Estrategias de comunicación para el ciberactivismo
Estrategias de comunicación para el ciberactivismoEstrategias de comunicación para el ciberactivismo
Estrategias de comunicación para el ciberactivismo
 
[KGC 2013] Online Game Security in China
[KGC 2013] Online Game Security in China[KGC 2013] Online Game Security in China
[KGC 2013] Online Game Security in China
 
新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙
新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙
新浪微博的BigPipe后端实现技术分享——11月26日淘宝aDev技术沙龙
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
 
Bahadur shah (son of king prithivinarayan)
Bahadur shah (son of king prithivinarayan)Bahadur shah (son of king prithivinarayan)
Bahadur shah (son of king prithivinarayan)
 
Oscars after - party
Oscars after - partyOscars after - party
Oscars after - party
 
गोष्टी सांगेन युक्तीच्या चार
गोष्टी सांगेन युक्तीच्या चारगोष्टी सांगेन युक्तीच्या चार
गोष्टी सांगेन युक्तीच्या चार
 
تقرير حول انتهاكات السجون في مصر
تقرير حول انتهاكات السجون في مصر تقرير حول انتهاكات السجون في مصر
تقرير حول انتهاكات السجون في مصر
 
Međuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatoriMeđuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatori
 
Vi lever for å levere
Vi lever for å levereVi lever for å levere
Vi lever for å levere
 

Similaire à Nvidia GTC 2014 Talk

Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxssuser30e7d2
 
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...NTT Communications Technology Development
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...journalBEEI
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORNVIDIA Japan
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano
 
Multi-GPU FFT Performance on Different Hardware
Multi-GPU FFT Performance on Different HardwareMulti-GPU FFT Performance on Different Hardware
Multi-GPU FFT Performance on Different Hardwareinside-BigData.com
 
CGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSCGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSIgor Sfiligoi
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCarlos Reaño González
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit Ganesan Narayanasamy
 

Similaire à Nvidia GTC 2014 Talk (20)

Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Multi-GPU FFT Performance on Different Hardware
Multi-GPU FFT Performance on Different HardwareMulti-GPU FFT Performance on Different Hardware
Multi-GPU FFT Performance on Different Hardware
 
CGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSCGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUS
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Nvidia GTC 2014 Talk

  • 1. 04/01/14 1 Establishing a CUDA Research Center at Penn State: Perspectives on GPU-Enabled Teaching and Research William J. Brouwer (wjb19@psu.edu) Pierre-Yves Taunay (py.taunay@psu.edu) Research Computing and Cyberinfrastructure The Pennsylvania State University Nvidia GTC 2014
  • 2. 04/01/14 2 Outline ● Center Overview (RCC @ PSU) ● GPU accelerated research ● IceCube ● Metabolic Networks (Fsolve/cuSolve) ● MD + Simulated Annealing ● FQHE (LU Decomposition) ● Smart Proppants (QR Decomposition) ● GPU cluster scaling ● Amber ● PetaChem ● Quantum Espresso – Lanczos Diagonalization ● CUDA, needs + wants ● Summary Nvidia GTC 2014
  • 3. 04/01/14 3 Center Overview ● Research Computing and Cyberinfrastructure (RCC) at PSU provides high performance computing services : ● Hardware, proprietary/open source software ● Consultation (numerical/algorithmic, software development etc) ● PhD's, system admins and programmers work together to provide these services to academics while performing independent research ● Many users are interested in using GPUs for science and engineering research applications, we are a CUDA research center https://research.nvidia.com/content/penn-state-crc-summary ● Formerly under ITS, currently incorporating into Office of the Vice President for Research (OVPR) Nvidia GTC 2014
  • 4. 04/01/14 4 Center Overview ● Hardware is ~ 12K CPU cores, 64 GPUs (Fermi), several Kepler ● Red Hat Linux, scheduling via PBS/Moab/Torque ● Usual monitoring/management tools eg., Puppet, Jenkins, Nagios, Ganglia, and some custom solution(s) ( eg., CLPR) ● Serve ~ 7k users, all campuses in the commonwealth ● Use CUDA predominantly, although growing numbers of users trying OpenACC, OpenCL, libraries etc ● Environment modules system Nvidia GTC 2014
  • 5. 04/01/14 5 Center Overview ● Support many GPU accelerated applications Nvidia GTC 2014
  • 6. 04/01/14 6 Outline ● Center Overview (RCC @ PSU) ● GPU accelerated research ● IceCube ● Metabolic Networks (Fsolve/cuSolve) ● MD + Simulated Annealing ● FQHE (LU Decomposition) ● Smart Proppants (QR Decomposition) ● GPU cluster scaling ● Amber ● PetaChem ● Quantum Espresso – Lanczos Diagonalization ● CUDA, needs + wants ● Summary Nvidia GTC 2014
  • 7. 04/01/14 7 Nvidia GTC 2014 IceCube
  • 8. 04/01/14 8 Metabolic Networks ● Optimal models for the metabolic networks of microbial organisms important in pharma, energy industries ● Ensemble Modeling (EM) is used to construct chemical kinetics of microbial organisms → decompose metabolic reactions into the elementary mechanisms, which are ODE systems f(ki ,yj ) = dyj /dt Nvidia GTC 2014 ● Overall approach maximizes correlation between model predictions and experimental measurements, performed in steady state → solve f(k,y) = 0
  • 9. 04/01/14 9 Metabolic Networks ● [CPU] parse equations f(k,y) ● [CPU] differentiate f(k,y), create analytic J(k,y) ● [CPU] populate data structures representing f(k,y), J(k,y), copy to GPU ● [GPU] Iterate (Newton-Raphson) → ● Numerically evaluate f(k,y) and J(k,y) by parallel reduction ● Solve for delta in f(k,y) = -delta . J(k,y) using GMRES ● Update y += delta and repeat until ||f(k,y)|| < tol Nvidia GTC 2014
  • 10. 04/01/14 10 Metabolic Networks Nvidia GTC 2014 ● Solution uses various libraries including Boost, Thrust, CUSP and CUDA ● Matrices sparse, poorly conditioned, but solution works well for O(10^2) equations ● Currently working to scale to larger, more interesting networks and microbial organisms ● CuSolve is a work in progress, a GPU-only ODE solve for stiff equations
  • 11. 04/01/14 11 Molecular Dynamics + Sim Anneal Nvidia GTC 2014 ● Solve for MD potentials by fitting experimental data for structure factor ● Optimization surface (below) is highly non-convex → use simulated annealing, each GPU performs independent MD run
  • 12. 04/01/14 12 LU Decomposition Nvidia GTC 2014 ● Batch LU decomposition developed for fractional quantum Hall effect, fundamental physics that has implications in quantum computation and material science ● O(N!) determinants need to be evaluated in constructing wavefunction, process repeated many times in Monte Carlo calculation ● Small, dense matrices of side <= 512 ● Implementation exploits SIMD architecture, parallel reduction ● Example; N=11, computation time using 8 GPU devices (w/ MPI), 1024 Monte Carlo iterations is ~ 246 seconds from ~ 31488 single CPU
  • 14. 04/01/14 14 QR Decomposition Nvidia GTC 2014 ● Proppant materials used to stabilize fissures created during hydraulic fracturing ● 'Smart proppants' are essentially electrical dipoles which may absorb and re-emit EM energy, irradiated and recorded by downhole instrumentation ● This work considers an iteration-free solution to this EM scattering problem, uses linear algebra including LU and SVD decomposition ● SVD can be performed using the QR algorithm, in turn a function of QR decomposition ● Devised a unique approach for large batches of dense small matrices using Givens rotations; largely independent ops, maps well to GPU
  • 16. 04/01/14 16 Outline ● Center Overview (RCC @ PSU) ● GPU accelerated research ● IceCube ● Metabolic Networks (Fsolve/cuSolve) ● MD + Simulated Annealing ● FQHE (LU Decomposition) ● Smart Proppants (QR Decomposition) ● GPU cluster scaling ● Amber ● PetaChem ● Quantum Espresso – Lanczos Diagonalization ● CUDA, needs + wants ● Summary Nvidia GTC 2014
  • 17. 04/01/14 17 GPU Cluster Scaling Nvidia GTC 2014 ● Several key GPU accelerated software suites were tested using multiple GPUs across two clusters Cluster Lion-GA Stampede CPU 12 X5675 @ 3.07 GHz 16 E5-2680 @ 2.70 GHz GPU 8 M2070 or 8 M2090 1 K20c Nodes equipped with GPUs 8 120 Interconnect 40 Gb/s Mellanox QDR Infiniband 56 Gb/s Mellanox FDR Infiniband
  • 18. 04/01/14 18 GPU Cluster Scaling Nvidia GTC 2014 ● Lion-GA cluster has 3 GPUs per PCIe switch, 3 to 5 GPUs per IOH chip ● IOH doesn't support peer to peer transfers between GPU devices on different chipsets ● Difficult to achieve peak transfer rates across GPU on different sockets
  • 19. 04/01/14 19 Amber Nvidia GTC 2014 ● Molecular Dynamics is widely used for simulation of solvated proteins or molecules and make use of various force fields (AMBER, ReaxFF, etc.) ● AMBER force field is implemented in the eponymous software suite ● The software PMEMD in AMBER is used for both explicit solvent Particle Mesh Ewald (PME) and implicit solvent General Borne (GB) simulations ● AMBER does not require extensive communication between GPUs or between CPU and GPU, and does not take advantage of the CPU if GPUs are used ● GPU acceleration allows for longer simulation times ~ nanosecond or more
  • 20. 04/01/14 20 Nvidia GTC 2014 12 X5675 2 M2090 4 M2090 6 M2090 8 M2090 01020304050607080 PME simulation of DHFR protein in water (NPT ensemble, 23,558 atoms) Achieved performance on Lion-GA ns/day Amber
  • 21. 04/01/14 21 Nvidia GTC 2014 12 X5675 2 M2090 4 M2090 6 M2090 8 M2090 024681012141618 PME simulation of FactorIX molecule in water (NPT ensemble, 90,906 atoms) Achieved performance on Lion-GA ns/day Amber
  • 22. 04/01/14 22 Nvidia GTC 2014 12 X5675 2 M2090 4 M2090 6 M2090 8 M2090 00.511.522.533.544.5 PME simulation of Cellulose molecule in water (NPT ensemble, 408,609 atoms) Achieved performance on Lion-GA ns/day Amber
  • 23. 04/01/14 23 Nvidia GTC 2014 12 X5675 2 M2090 4 M2090 6 M2090 8 M2090 050100150200 Implicit solvent GB simulation of Myoglobin (2,492 atoms) Achieved performance on Lion-GA ns/day Amber
  • 24. 04/01/14 24 Nvidia GTC 2014 12 X5675 2 M2090 4 M2090 6 M2090 8 M2090 01234567 Implicit solvent GB simulation of Nucleosome (25,095 atoms) Achieved performance on Lion-GA ns/day Amber
  • 25. 04/01/14 25 PetaChem Nvidia GTC 2014 ● Quantum Chemistry designed to run on NVIDIA series hardware ● Features restricted Hartree-Fock and grid-based Kohn-Sham single point energy and gradient calculations ● Various functions supported, geometry optimization, ab-initio molecular dynamics, support for multi-GPU ● Benchmark: single point energy, using basis 6-31g for Olestra
  • 26. 04/01/14 26 PetaChem Nvidia GTC 2014 1 M2070 3 M2070 5 M2070 7 M2070 0100200300400500600 PetaChem Olestra SCF calculation Total walltime (in s) on Lion-GA Walltime(s)
  • 27. 04/01/14 27 Quantum Espresso Nvidia GTC 2014 ● Density Functional Theory (DFT) has enjoyed huge growth in popularity owing to computational and numerical advancements; used widely in material science ● Quantum Espresso (QE) is an open source DFT package that has recently added GPU acceleration, largely through BLAS and FFT routines ● When building QE with MAGMA (UT/ORNL) or phiGEMM, one introduces heterogeneous CPU/GPU linear algebra routines ● Benchmark: ● Self-consistent field calculation, using PBE pseudopotentials,168 atoms (cellulose) ● Periodic boundary conditions, kinetic energy cutoff (Ry) for charge density of 80 Ry, Davidson diagonalization
  • 28. 04/01/14 28 Nvidia GTC 2014 1 K20 2 K20 4 K20 8 K20 16 K20 32 K20 01234567 SCF calculation for cellulose Total walltime (in hrs) on Stampede@TACC Walltime(hrs) Quantum Espresso
  • 29. 04/01/14 29 Lanczos Diagonalization Nvidia GTC 2014 ● Key task in many applications, esp quantum chemistry & DFT is diagonalization ie., matrix eigen-decomposition ● Lanczos is a power method, produces a tri-diagonal matrix, more readily solvable; consists of many matrix-vector operations, very amenable to GPU, currently using cuBLAS &MKL in a heterogeneous solution. ● Originally devised for fundamental physics project at PSU, now intended for incorporation into GPU-Quantum Espresso project being led by Filippo Spiga ● Attempting to scale to multiple devices using MPI + GPUdirect, still beset by some numerical/convergence problems with increasing matrix size
  • 31. 04/01/14 31 Lanczos Diagonalization Nvidia GTC 2014 ● CUDA 5.5/Kepler overall yields pleasing communication results (CUDA- enabled openmpi 1.7.3, MPI send/recv), collectives less impressive ● Bandwidths for one-sided comms have some message size dependency &jitter, but effective bandwidth much improved over previous gens. 1e+07 2 4 6 8 5 4 3 2 BandwidthGB/s Increasing msg size in MB, within single application ● Results of 4 tests ● Rhel 6, Intel x86_64, Nvidia driver 331.38 ● Communication btwn K20 & K40
  • 32. 04/01/14 32 Outline ● Center Overview (RCC @ PSU) ● GPU accelerated research ● IceCube ● Metabolic Networks (Fsolve/cuSolve) ● MD + Simulated Annealing ● FQHE (LU Decomposition) ● Smart Proppants (QR Decomposition) ● GPU cluster scaling ● Amber ● PetaChem ● Quantum Espresso – Lanczos Diagonalization ● CUDA, needs + wants ● Summary Nvidia GTC 2014
  • 33. 04/01/14 33 CUDA needs + wants Nvidia GTC 2014 ● ODE and Function Solver(s), metabolic networks, chemically reactive flows w/ OpenFOAM → support for more C++11 language features? ● Lanczos Diagonalization, DFT/quantum chemistry, incorporation into Quantum Espresso → further improvements to GPUdirect (or use new multi-GPU interfaces instead)? ● Batch LU/QR → increased warp size?
  • 34. 04/01/14 34 Summary Nvidia GTC 2014 ● Early adopters astrophysics, quantum chem/condensed matter still active, see most growth in strands of computational biology/life science, 'big data' ● Teaching seminars generally well received/attended, but... ● Most success from working to identify users/codes that can benefit from GPU by monitoring clusters, and on a related note... ● The harvest is plentiful in academia but the workers are few; generally if a code 'works' little pressure to make it better ● However changes even in traditional CPU architecture are forcing workers to reevaluate their computational models (thanks Ken Esler for this perspective); we live more and more in a parallel world
  • 35. 04/01/14 35 Acknowledgements Nvidia GTC 2014 ● Mark Berger, Chandra Cheij &Nvidia for generous donations ● {Ryan Eagen/Cowen group, Ali Khodayari/Maranas group, Sreejith Jaya Ganesh, Jim Kubicki, Dan Haworth, Adri Van Duin} PSU ● {Chuck Gilbert, Jason Holmes} long-suffering sys admins ● HP for donation of 50 M2070 ● XSEDE/TACC for Stampede cycles