Deep Learning Accelerator Design Techniques

•

0 j'aime•767 vues

Mindos Cheng

Some notes for Deep Learning hardware design.

Technologie

Design Techniques
for DLA
* DLA stands for Deep Learning Accelerator

(draft)

https://en.wikipedia.org/wiki/The_Boss_Baby

Dark Silicon
https://www.publicdomainpictures.net/en/view-image.php?image=44607&picture=portrait-of-the-dark-sides-man

Convolution
• computation intensive

• Around 1x1xC ~ 11x11xC

• Variant

• depthwise Separable Convolution

• sparse

•

Hardware Accelerator Design for Machine Learning, 2016

* A Reconﬁgurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things

For Larger Convolution Kernels

pruned and retrained
Hardware Accelerator Design for Machine Learning, 2016

pruned and retrained
Deep compression: Compressing DNNs with pruning, trained quantization and huﬀman coding, 2015

Tensor Core
"SIMD" for GPU
https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/

General Tricks
• Burst/fetch continues blocks

•

Analog computing
Hardware Accelerator Design for Machine Learning, 2016

Thermal
Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era, 2016

Memory Bandwidth
Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era, 2016

• most of the energy is consumed not in computation but in
moving data to and from memory

• moving from 16- to 64-bit fetches only changes the
energy by 1.5x

•

Contenu connexe

Tendances

"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...Edge AI and Vision Alliance

On-Device AILGCNSairesearch

AI HardwareShahzaib Mahesar

Accelerate Machine Learning Software on Intel Architecture Intel® Software

FPGAs and Machine Learninginside-BigData.com

A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan

Summit workshop thomptoGanesan Narayanasamy

A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING mlaij

MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy

Hardware Acceleration for Machine LearningCastLabKAIST

"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from MovidiusEdge AI and Vision Alliance

Deep LearningBüşra İçöz

FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel

2. Cnnecst-Why the use of FPGA? CNNECST - Convolutional Neural Networks

A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan

Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software

IBM HPC Transformation with AI Ganesan Narayanasamy

NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA Taiwan

SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy

Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan

Tendances (20)

"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...

On-Device AI

AI Hardware

Accelerate Machine Learning Software on Intel Architecture

FPGAs and Machine Learning

A Primer on FPGAs - Field Programmable Gate Arrays

Summit workshop thompto

A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING

MIT's experience on OpenPOWER/POWER 9 platform

Hardware Acceleration for Machine Learning

"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius

Deep Learning

FPGA Hardware Accelerator for Machine Learning

2. Cnnecst-Why the use of FPGA?

A Platform for Accelerating Machine Learning Applications

Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...

IBM HPC Transformation with AI

NVIDIA 深度學習教育機構 (DLI): Neural network deployment

SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial

Evolution of Supermicro GPU Server Solution

Similaire à Deep Learning Accelerator Design Techniques

Urs Köster Presenting at RE-Work DL Summit in BostonIntel Nervana

AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra

Nervana and the Future of ComputingIntel Nervana

Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski

Squeezing Deep Learning Into Mobile PhonesAnirudh Koul

Hambug R Meetup - Intro to H2OSri Ambati

Introduction to Software Defined Visualization (SDVis)Intel® Software

Bringing Deep Learning into production Paolo Platter

Gömülü Sistemlerde Derin Öğrenme UygulamalarıFerhat Kurt

Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy

Chug dl presentationChicago Hadoop Users Group

"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance

Azure Batch AI for Neural Networks Cameron Vetter

Build Low-Latency Applications in Rust on ScyllaDBScyllaDB

Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA

Big Data Anti-Patterns: Lessons From the Front LIneDouglas Moore

Nvidia at SEMICon, MunichAlison B. Lowndes

Introduction to KerasJohn Ramey

Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Simplilearn

Deep Learning and What's Next?Tao Wang

Similaire à Deep Learning Accelerator Design Techniques (20)

Urs Köster Presenting at RE-Work DL Summit in Boston

AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...

Nervana and the Future of Computing

Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...

Squeezing Deep Learning Into Mobile Phones

Hambug R Meetup - Intro to H2O

Introduction to Software Defined Visualization (SDVis)

Bringing Deep Learning into production

Gömülü Sistemlerde Derin Öğrenme Uygulamaları

Innovation with ai at scale on the edge vt sept 2019 v0

Chug dl presentation

"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...

Azure Batch AI for Neural Networks

Build Low-Latency Applications in Rust on ScyllaDB

Deep Learning Frameworks Using Spark on YARN by Vartika Singh

Big Data Anti-Patterns: Lessons From the Front LIne

Nvidia at SEMICon, Munich

Introduction to Keras

Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...

Deep Learning and What's Next?

Plus de Mindos Cheng

Tensor CoreMindos Cheng

Open GL ES AndroidMindos Cheng

Why Systolic ArchitecturesMindos Cheng

Federated learningMindos Cheng

OpenGL ES 3.0 2013Mindos Cheng

Introduction to G0V.tw 2013Mindos Cheng

Google IO 2016Mindos Cheng

GTC 2016 Taiwan StartupsMindos Cheng

GTC 2016 Taiwan DemosMindos Cheng

GTC 2016 Taiwan GeneralMindos Cheng

ORB SLAM Proposal for NTU GPU Programming Course 2016Mindos Cheng

Few Things about Mobile GPUMindos Cheng

Graph-powered Machine Learning at Google @ Google BlogMindos Cheng

Plus de Mindos Cheng (13)

Tensor Core

Open GL ES Android

Why Systolic Architectures

Federated learning

OpenGL ES 3.0 2013

Introduction to G0V.tw 2013

Google IO 2016

GTC 2016 Taiwan Startups

GTC 2016 Taiwan Demos

GTC 2016 Taiwan General

ORB SLAM Proposal for NTU GPU Programming Course 2016

Few Things about Mobile GPU

Graph-powered Machine Learning at Google @ Google Blog

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Data Cloud, More than a CDP by Matt Robison

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Presentation on how to chat with PDF using ChatGPT code interpreter

Exploring the Future Potential of AI-Enabled Smartphone Processors

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

CNv6 Instructor Chapter 6 Quality of Service

08448380779 Call Girls In Friends Colony Women Seeking Men

Boost PC performance: How more available memory can improve productivity

Handwritten Text Recognition for manuscripts and early printed texts

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Breaking the Kubernetes Kill Chain: Host Path Mount

How to Troubleshoot Apps for the Modern Connected Worker

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Deep Learning Accelerator Design Techniques

1. Design Techniques for DLA * DLA stands for Deep Learning Accelerator (draft)

2. https://en.wikipedia.org/wiki/The_Boss_Baby

3. Dark Silicon https://www.publicdomainpictures.net/en/view-image.php?image=44607&picture=portrait-of-the-dark-sides-man

4. Rooﬂine Model

5. Layer Behaviors

6. Convolution • computation intensive • Around 1x1xC ~ 11x11xC • Variant • depthwise Separable Convolution • sparse •

7. Fully-Connected • Most weights

8. CNN Accelerator

9. Hardware Accelerator Design for Machine Learning, 2016

10. Hardware Accelerator Design for Machine Learning, 2016

11. Filter Decomposition

12. * A Reconﬁgurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things For Larger Convolution Kernels

13. Hardware Accelerator Design for Machine Learning, 2016

14. Model Compression

15. pruned and retrained Hardware Accelerator Design for Machine Learning, 2016

16. pruned and retrained Deep compression: Compressing DNNs with pruning, trained quantization and huﬀman coding, 2015

17. Tensor Core "SIMD" for GPU https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/

18. Systolic Array

19. GotoBLAS library

20. General Tricks • Burst/fetch continues blocks •

21. Analog computing Hardware Accelerator Design for Machine Learning, 2016

22. Hardware Accelerator Design for Machine Learning, 2016

23.

24. Thermal Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era, 2016

25. Memory Bandwidth Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era, 2016

26. • most of the energy is consumed not in computation but in moving data to and from memory • moving from 16- to 64-bit fetches only changes the energy by 1.5x •

27. • ZERO copy

28. • Conv memory remapping

29. • Embedded Binarized Neural Networks

Deep Learning Accelerator Design Techniques

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Deep Learning Accelerator Design Techniques

Similaire à Deep Learning Accelerator Design Techniques (20)

Plus de Mindos Cheng

Plus de Mindos Cheng (13)

Dernier

Dernier (20)

Deep Learning Accelerator Design Techniques