SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Copyright © 2016 Intel Corporation 1
Accelerating Deep Learning Using
Altera FPGAs
Bill Jenkins
May 3, 2016
Copyright © 2016 Intel Corporation 2
Legal Notices and Disclaimers
• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service
activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure.
• Tests document performance of components on a particular test, in specific systems. Results have been estimated or simulated
using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Differences in
hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance
as you consider your purchase. For more complete information about performance and benchmark results, visit
http://www.intel.com/performance.
• Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances
and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs
or cost reduction.
• All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product
specifications and roadmaps.
• Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-
looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s
results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.
• The products described may contain design defects or errors known as errata which may cause the product to deviate from
published specifications. Current characterized errata are available on request.
• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
• Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the
referenced web site and confirm whether referenced data are accurate.
• Intel, the Intel logo, and Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and
brands may be claimed as the property of others.
Copyright © 2016 Intel Corporation 3
• Accelerated FPGA innovation from
combined R&D scale
• Improved FPGA performance/power
via early access and greater
optimization of process node
advancements
• New, breakthrough Data Center and
IoT products harnessing combined
FPGA + CPU expertise
Altera and Intel Enhance the FPGA Value Proposition
Accelerated FPGA investment
Operational excellence
STRATEGIC RATIONALE
• Superior product design capabilities
• Continued excellence in customer
service and support
• Increased resources bolster long-term
innovation
• Focused, additive investments today
Copyright © 2016 Intel Corporation 4
• Extracting features from data in order to solve predictive problems
• Image classification & detection
• Image recognition/tagging
• Network intrusion detection
• Fraud / face detection
• Aim is programs that automatically learn to recognize complex patterns and make
intelligent decisions based on insight generated from learning
• For accuracy, models must be trained, tested and calibrated to detect patterns
using previous experience
What is Machine Learning?
Copyright © 2016 Intel Corporation 5
• Human expertise is absent
• Navigating to Pluto
• Humans cannot explain their expertise
• Speech recognition
• Solution changes over time
• Tracking traffic
• Solution needs to be adapted to particular cases
• Medical diagnosis
• Problem is vast in relation to human reasoning capabilities
• Ranking web pages on Google or Bing
When to Apply Machine Learning
Copyright © 2016 Intel Corporation 6
Value Proposition of Machine Learning
X 35ZB/s =
Increasing
Variety of
Things
Volume x
Velocity =
Throughput
Separating Signal
from Noise
Provides Value
Data is the problem
Revenue
Growth
Cost
Savings
Increased
Margin
Copyright © 2016 Intel Corporation 7
• A network of interconnected
neurons, modeled after biological
processes, for computing
approximate functions
• Layers extract successively higher
level of features
• Often want a custom topology to
meet specific application
accuracy/throughput requirements
Convolutional Neural Networks (CNN)
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based Learning Applied to
Document Recognition. IEEE98
Copyright © 2016 Intel Corporation 8
CNN Computation in One Slide
Inew 𝑥 𝑦 = Iold
1
𝑦′=−1
1
𝑥′=−1
𝑥 + 𝑥′ 𝑦 + 𝑦′ × F 𝑥′ 𝑦′
Input Feature Map
(Set of 2D Images)
Filter
(3D Space)
Output Feature Map
Repeat for Multiple Filters to Create Multiple
“Layers” of Output Feature Map
Copyright © 2016 Intel Corporation 9
What’s in my FPGA?
• DSPs
• Dedicated single-precision floating
point multiply and accumulators
• Block RAMs
• Small embedded memories that can
be stitched to form an arbitrary
memory system
• Programmable Interconnect
• Programmable logic and routing that
can build arbitrary topologies
• Compute architecture with high degree
of customization
X
+
Copyright © 2016 Intel Corporation 10
• 1 TFLOP floating point performance in mid-
range part
• 35W total device power
• Use every DSP, every clock cycle compute
spatially
• 8 TB/s memory bandwidth to keep the state on
chip!
• Exceeds available external bandwidth by
factor of 50
• Random access, low latency (2 clks)
• Place all data in on-chip memory compute
temporally
Why an FPGA for CNN? (Arria 10)
X
+
X
+
X
+
X
+ M20K
M20K
M20K
M20K
Fine-grained & low latency
between compute and memory
Copyright © 2016 Intel Corporation 11
CNNs on FPGAs — Scalable Architecture
Copyright © 2016 Intel Corporation 12
Market Demands Scalability for Machine Learning
• 1000s of Classes
• Large Workloads
• Highly Efficient
(Performance / W)
• Varying accuracy
• Server Form Factor
Cloud Analytics Transportation Safety
• < 10 Classes
• Frame Rate: 15–30fps
• Power: 1W-5W
• Cost: Low
• Varying accuracy
• Custom Form Factor
Copyright © 2016 Intel Corporation 13
Old Approach
• Parallelism across the “face” of the
kernel window, and across multiple
convolution stages
• Low hardware re-use
Different Parallelism in CNN
New Approach
• Parallelism in the depth of the kernel
window and across output features
Defer complex spatial math to
random access memory
• Re-use hardware to compute
multiple layers
Copyright © 2016 Intel Corporation 14
Scalable CNN Computations — In One Slide
accum
accum
accum
Output
Feature Map
“Slide”  No data movement.
Addressing an on-chip RAM!
Filters
Copyright © 2016 Intel Corporation 15
Scalable CNN Architecture on FPGA (1)
FPGA
Double-Buffer
On-Chip RAM
DDR
Filters
(on-chip RAM)
#ofParallel
Convolutions
Copyright © 2016 Intel Corporation 16
Scalable CNN Architecture on FPGA (2)
• Array size
(x, y)
• Clock rate
• External memory
bandwidth
Calculated throughput &
resource utilization
• Layer
descriptions
• Given resource constraints,
find optimal architecture
• Ex. AlexNet on A10-115 is 52x26
for 800 img/s @ 350 MHz
Copyright © 2016 Intel Corporation 17
• Choice of parallelism has large impact on end compute architecture and properties of solution
• Defined a scalable approach to CNNs on the FPGA
• Not tied to specific FPGA device
• Not tied to specific CNN topology
• Design Methodology:
1. Fit largest possible accelerator network on FPGA (52x26 on Arria 10)
• Limited by DSP Blocks & M20K (RAM) Resources
2. Tile network onto available accelerator
• Decompose filter window into 1x1xW vectors for dot product
Scalable CNN Architecture on FPGA (3)
Copyright © 2016 Intel Corporation 18
AlexNet Competitive Analysis — Classification
System (Precision, Image, Speed)1 Throughput
Est. Board
Power
Throughput /
Watt
Arria 10-115 (Current: FP32, Full Size, @275Mhz) 575 img/s ~31W 18.5 img/s/W
Arria 10-115 (Optimized: FP32, Full Size, @350Mhz) 750 img/s ~36W 20.8 img/s/W
Arria 10-115 (Estimate: FP16, Full Size, @350Mhz) 900 img/s ~39W 23.1 img/s/W
Arria 10-115 (Estimate: 21b, Full Size, @350Mhz) 1200 img/s ~40W 30 img/s/W
2 x Arria 10-115
Nallatech 510T Board
2400 img/s ~75W 32 img/s/W
cuDNN4 on NVIDIA Titan X
Source: NVIDIA Corporation, GPU-Based Deep Learning Inference: A Performance and
Power Analysis, November 2015
3216 img/s 227W 14.2 img/s/W
• Further algorithmic optimization of FPGA possible
• Expect similar ratios for Stratix10 vs. NVIDIA 14nm Pascal
Copyright © 2016 Intel Corporation 19
Getting Started with CNNs on FPGAs
High-Performance
Machine Learning
Desired
Accelerate
Computation
Scale & Speed of Devices
Better Compute Architecture
Math Optimization (Winograd, FFT)
Optimized RTL / HLD
(Current Intel PSG focus,
original MSFT focus)
Tune Problem
to Platform
Simplify network topology
Reduce precision / use fixed point
Create more local neuron structures
Integrated training and classification
(Current i-Abra and partner focus)
Not Mutually Exclusive
Combine for Optimal Solution
Copyright © 2016 Intel Corporation 20
Overview: Design Flow Using CNN IP
Data
Collection
Data
Store
Choose
Network
Train
Network
Execution
Engine
Improvement Strategies
• Collect more data
• Improve network
Parameters
Selection
Architecture
Choose Network
• Use framework (e.g. Caffé,
Torch)
• Choose based on experience
or limits of execution engine
Train Network
• An HPC workload
• Requires data to be pre-
selected
• Weeks to Months process
Execution Engine
• Implementation of the
Neural Network
• Flexibility, performance &
power dominate choice
Altera
CNN IP
Copyright © 2016 Intel Corporation 21
Overview: Design Flow for CNN Using Partner
Data
Collection
Data
Store
Neural
Pathways
Neural
Synapse
Parameters
Selection
Architecture
Neural Pathways
• Integrated Network
selection and training
• Capable of acceleration in
FPGA
• Minutes to hours process
Neural Synapse
• Implementation of highly
efficient Neural Network
• Built in FPGA fabric with
OpenCL
Altera
CNN IP
Copyright © 2016 Intel Corporation 22
• New opportunities to increase the FPGA value proposition
• Accelerated FPGA investment driving product innovation to increase your
performance and productivity
• Increased operational excellence to accelerate time-to-market
• Expanded product portfolio to arm you with new solutions for your most
challenging applications
• Come join us at our booth to see a demo of machine learning on FPGAs
Join Us on Our Journey Together…
How can Intel + Altera help your business grow?
Copyright © 2016 Intel Corporation 23
• Altera Website
• Altera SDK for OpenCL Page (www.altera.com/opencl)
• Technical Article “Efficient Implementation of Neural Network Systems Built
on FPGAs, Programmed with OpenCL” (www.altera.com/deeplearning-
tech-article)
• GPU vs FPGA overview online training (available mid-May)
• CNN on FPGA whitepaper (available mid-May)
• “Machine Learning on FPGAs” web page (available mid-May)
• Embedded Vision Alliance Website
• Technical Article “OpenCL Streamlines FPGA Acceleration of Computer Vision”
Resources
Copyright © 2016 Intel Corporation 24
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies
depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause
the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products.
© Intel Corporation
Slide 18
Footnote 1. Configurations:
AlexNet configurations on Arria 10-115 FPGAs optimized via IP - tested by Intel PSG
For more information go to https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/pt/arria-10-product-table.pdf
Legal Notices and Disclaimers
Copyright © 2016 Intel Corporation 25
Thank You

Contenu connexe

Tendances

System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
Shivam Gupta
 

Tendances (20)

Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
Linux on RISC-V with Open Hardware (ELC-E 2020)
Linux on RISC-V with Open Hardware (ELC-E 2020)Linux on RISC-V with Open Hardware (ELC-E 2020)
Linux on RISC-V with Open Hardware (ELC-E 2020)
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
Neuromorphic Chipsets - Industry Adoption Analysis
Neuromorphic Chipsets - Industry Adoption AnalysisNeuromorphic Chipsets - Industry Adoption Analysis
Neuromorphic Chipsets - Industry Adoption Analysis
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
DSP by FPGA
DSP by FPGADSP by FPGA
DSP by FPGA
 
Processing-in-Memory
Processing-in-MemoryProcessing-in-Memory
Processing-in-Memory
 
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P..."Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
 
6th gen processor
6th gen processor6th gen processor
6th gen processor
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer Vision
 
FPGA
FPGAFPGA
FPGA
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAs
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Tensorflow lite for microcontroller
Tensorflow lite for microcontrollerTensorflow lite for microcontroller
Tensorflow lite for microcontroller
 

Similaire à "Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel

Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing Transformation
Intel IT Center
 
Acceleration_and_Security_draft_v2
Acceleration_and_Security_draft_v2Acceleration_and_Security_draft_v2
Acceleration_and_Security_draft_v2
Srinivasa Addepalli
 

Similaire à "Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel (20)

Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
The Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoTThe Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoT
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
Omni path-fabric-software-architecture-overview
Omni path-fabric-software-architecture-overviewOmni path-fabric-software-architecture-overview
Omni path-fabric-software-architecture-overview
 
Overview of Intel® Omni-Path Architecture
Overview of Intel® Omni-Path ArchitectureOverview of Intel® Omni-Path Architecture
Overview of Intel® Omni-Path Architecture
 
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing Transformation
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing Slides
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
 
Edge Computing and 5G - SDN/NFV London meetup
Edge Computing and 5G - SDN/NFV London meetupEdge Computing and 5G - SDN/NFV London meetup
Edge Computing and 5G - SDN/NFV London meetup
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationPedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon Innovation
 
Building Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksBuilding Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery Networks
 
Acceleration_and_Security_draft_v2
Acceleration_and_Security_draft_v2Acceleration_and_Security_draft_v2
Acceleration_and_Security_draft_v2
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 

Plus de Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

Plus de Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel

  • 1. Copyright © 2016 Intel Corporation 1 Accelerating Deep Learning Using Altera FPGAs Bill Jenkins May 3, 2016
  • 2. Copyright © 2016 Intel Corporation 2 Legal Notices and Disclaimers • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. • Tests document performance of components on a particular test, in specific systems. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. • Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. • All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. • Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward- looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. • The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. • No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. • Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. • Intel, the Intel logo, and Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
  • 3. Copyright © 2016 Intel Corporation 3 • Accelerated FPGA innovation from combined R&D scale • Improved FPGA performance/power via early access and greater optimization of process node advancements • New, breakthrough Data Center and IoT products harnessing combined FPGA + CPU expertise Altera and Intel Enhance the FPGA Value Proposition Accelerated FPGA investment Operational excellence STRATEGIC RATIONALE • Superior product design capabilities • Continued excellence in customer service and support • Increased resources bolster long-term innovation • Focused, additive investments today
  • 4. Copyright © 2016 Intel Corporation 4 • Extracting features from data in order to solve predictive problems • Image classification & detection • Image recognition/tagging • Network intrusion detection • Fraud / face detection • Aim is programs that automatically learn to recognize complex patterns and make intelligent decisions based on insight generated from learning • For accuracy, models must be trained, tested and calibrated to detect patterns using previous experience What is Machine Learning?
  • 5. Copyright © 2016 Intel Corporation 5 • Human expertise is absent • Navigating to Pluto • Humans cannot explain their expertise • Speech recognition • Solution changes over time • Tracking traffic • Solution needs to be adapted to particular cases • Medical diagnosis • Problem is vast in relation to human reasoning capabilities • Ranking web pages on Google or Bing When to Apply Machine Learning
  • 6. Copyright © 2016 Intel Corporation 6 Value Proposition of Machine Learning X 35ZB/s = Increasing Variety of Things Volume x Velocity = Throughput Separating Signal from Noise Provides Value Data is the problem Revenue Growth Cost Savings Increased Margin
  • 7. Copyright © 2016 Intel Corporation 7 • A network of interconnected neurons, modeled after biological processes, for computing approximate functions • Layers extract successively higher level of features • Often want a custom topology to meet specific application accuracy/throughput requirements Convolutional Neural Networks (CNN) Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based Learning Applied to Document Recognition. IEEE98
  • 8. Copyright © 2016 Intel Corporation 8 CNN Computation in One Slide Inew 𝑥 𝑦 = Iold 1 𝑦′=−1 1 𝑥′=−1 𝑥 + 𝑥′ 𝑦 + 𝑦′ × F 𝑥′ 𝑦′ Input Feature Map (Set of 2D Images) Filter (3D Space) Output Feature Map Repeat for Multiple Filters to Create Multiple “Layers” of Output Feature Map
  • 9. Copyright © 2016 Intel Corporation 9 What’s in my FPGA? • DSPs • Dedicated single-precision floating point multiply and accumulators • Block RAMs • Small embedded memories that can be stitched to form an arbitrary memory system • Programmable Interconnect • Programmable logic and routing that can build arbitrary topologies • Compute architecture with high degree of customization X +
  • 10. Copyright © 2016 Intel Corporation 10 • 1 TFLOP floating point performance in mid- range part • 35W total device power • Use every DSP, every clock cycle compute spatially • 8 TB/s memory bandwidth to keep the state on chip! • Exceeds available external bandwidth by factor of 50 • Random access, low latency (2 clks) • Place all data in on-chip memory compute temporally Why an FPGA for CNN? (Arria 10) X + X + X + X + M20K M20K M20K M20K Fine-grained & low latency between compute and memory
  • 11. Copyright © 2016 Intel Corporation 11 CNNs on FPGAs — Scalable Architecture
  • 12. Copyright © 2016 Intel Corporation 12 Market Demands Scalability for Machine Learning • 1000s of Classes • Large Workloads • Highly Efficient (Performance / W) • Varying accuracy • Server Form Factor Cloud Analytics Transportation Safety • < 10 Classes • Frame Rate: 15–30fps • Power: 1W-5W • Cost: Low • Varying accuracy • Custom Form Factor
  • 13. Copyright © 2016 Intel Corporation 13 Old Approach • Parallelism across the “face” of the kernel window, and across multiple convolution stages • Low hardware re-use Different Parallelism in CNN New Approach • Parallelism in the depth of the kernel window and across output features Defer complex spatial math to random access memory • Re-use hardware to compute multiple layers
  • 14. Copyright © 2016 Intel Corporation 14 Scalable CNN Computations — In One Slide accum accum accum Output Feature Map “Slide”  No data movement. Addressing an on-chip RAM! Filters
  • 15. Copyright © 2016 Intel Corporation 15 Scalable CNN Architecture on FPGA (1) FPGA Double-Buffer On-Chip RAM DDR Filters (on-chip RAM) #ofParallel Convolutions
  • 16. Copyright © 2016 Intel Corporation 16 Scalable CNN Architecture on FPGA (2) • Array size (x, y) • Clock rate • External memory bandwidth Calculated throughput & resource utilization • Layer descriptions • Given resource constraints, find optimal architecture • Ex. AlexNet on A10-115 is 52x26 for 800 img/s @ 350 MHz
  • 17. Copyright © 2016 Intel Corporation 17 • Choice of parallelism has large impact on end compute architecture and properties of solution • Defined a scalable approach to CNNs on the FPGA • Not tied to specific FPGA device • Not tied to specific CNN topology • Design Methodology: 1. Fit largest possible accelerator network on FPGA (52x26 on Arria 10) • Limited by DSP Blocks & M20K (RAM) Resources 2. Tile network onto available accelerator • Decompose filter window into 1x1xW vectors for dot product Scalable CNN Architecture on FPGA (3)
  • 18. Copyright © 2016 Intel Corporation 18 AlexNet Competitive Analysis — Classification System (Precision, Image, Speed)1 Throughput Est. Board Power Throughput / Watt Arria 10-115 (Current: FP32, Full Size, @275Mhz) 575 img/s ~31W 18.5 img/s/W Arria 10-115 (Optimized: FP32, Full Size, @350Mhz) 750 img/s ~36W 20.8 img/s/W Arria 10-115 (Estimate: FP16, Full Size, @350Mhz) 900 img/s ~39W 23.1 img/s/W Arria 10-115 (Estimate: 21b, Full Size, @350Mhz) 1200 img/s ~40W 30 img/s/W 2 x Arria 10-115 Nallatech 510T Board 2400 img/s ~75W 32 img/s/W cuDNN4 on NVIDIA Titan X Source: NVIDIA Corporation, GPU-Based Deep Learning Inference: A Performance and Power Analysis, November 2015 3216 img/s 227W 14.2 img/s/W • Further algorithmic optimization of FPGA possible • Expect similar ratios for Stratix10 vs. NVIDIA 14nm Pascal
  • 19. Copyright © 2016 Intel Corporation 19 Getting Started with CNNs on FPGAs High-Performance Machine Learning Desired Accelerate Computation Scale & Speed of Devices Better Compute Architecture Math Optimization (Winograd, FFT) Optimized RTL / HLD (Current Intel PSG focus, original MSFT focus) Tune Problem to Platform Simplify network topology Reduce precision / use fixed point Create more local neuron structures Integrated training and classification (Current i-Abra and partner focus) Not Mutually Exclusive Combine for Optimal Solution
  • 20. Copyright © 2016 Intel Corporation 20 Overview: Design Flow Using CNN IP Data Collection Data Store Choose Network Train Network Execution Engine Improvement Strategies • Collect more data • Improve network Parameters Selection Architecture Choose Network • Use framework (e.g. Caffé, Torch) • Choose based on experience or limits of execution engine Train Network • An HPC workload • Requires data to be pre- selected • Weeks to Months process Execution Engine • Implementation of the Neural Network • Flexibility, performance & power dominate choice Altera CNN IP
  • 21. Copyright © 2016 Intel Corporation 21 Overview: Design Flow for CNN Using Partner Data Collection Data Store Neural Pathways Neural Synapse Parameters Selection Architecture Neural Pathways • Integrated Network selection and training • Capable of acceleration in FPGA • Minutes to hours process Neural Synapse • Implementation of highly efficient Neural Network • Built in FPGA fabric with OpenCL Altera CNN IP
  • 22. Copyright © 2016 Intel Corporation 22 • New opportunities to increase the FPGA value proposition • Accelerated FPGA investment driving product innovation to increase your performance and productivity • Increased operational excellence to accelerate time-to-market • Expanded product portfolio to arm you with new solutions for your most challenging applications • Come join us at our booth to see a demo of machine learning on FPGAs Join Us on Our Journey Together… How can Intel + Altera help your business grow?
  • 23. Copyright © 2016 Intel Corporation 23 • Altera Website • Altera SDK for OpenCL Page (www.altera.com/opencl) • Technical Article “Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL” (www.altera.com/deeplearning- tech-article) • GPU vs FPGA overview online training (available mid-May) • CNN on FPGA whitepaper (available mid-May) • “Machine Learning on FPGAs” web page (available mid-May) • Embedded Vision Alliance Website • Technical Article “OpenCL Streamlines FPGA Acceleration of Computer Vision” Resources
  • 24. Copyright © 2016 Intel Corporation 24 Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. © Intel Corporation Slide 18 Footnote 1. Configurations: AlexNet configurations on Arria 10-115 FPGAs optimized via IP - tested by Intel PSG For more information go to https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/pt/arria-10-product-table.pdf Legal Notices and Disclaimers
  • 25. Copyright © 2016 Intel Corporation 25 Thank You