SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
VIETNAM NATIONAL UNIVERSITY HANOI (VNU)
VNU UNIVERSITY OF ENGINEERING AND TECHNOLOGY
Low Area ANN Architecture
with Stochastic Computing
and a Simplified Sigmoid Function
Huy-Hung Ho, Xuan-Thuan Nguyen, Van-Thuat Nguyen, Van-Dung Nguyen
Key Laboratory for Smart Integrated Systems (SISLAB),
VNU University of Engineering and Technology (VNU-UET)
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
4/20/2018 2
Big Systems
IoT Devices
Constraint
• Small area
• Low power
• Lifecycle
Reduce area cost
Context and Motivation
Backpropagation
algorithm
Hardware
implementation
Large
models
High
complexity
Training
Time
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Outline
• Context and Motivation
• Proposed design
• Evaluation
• Conclusion
4/20/2018 3
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Proposed Design
• Reference model of LSI Contest
– Fixed size 2-3-2 architecture
– Sigmoid function LUT: high area cost
– Complex operators
4/20/2018 4
Weighted input
• Additions
• Multiplications
Sigmoid
function
1
1 + 𝑒−𝑥
• Proposed design
– Parameterized ANN architecture (Backpropagation)
– Optimized Sigmoid Function
• Lower LUT memory
– Forward ANN use Stochastic Computing
• Replacing Addition, Multiplication by Logic gate
Low area cost
High frequency
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Optimized Sigmoid function
4/20/2018 5
Conventional
LUT sigmoid
Proposed
sigmoid
MSE 3.12 × 10−7
1.79 × 10−5
Area (cell) 85 67 ( -21.18% )
• Conventional Method
₋ PWL (piecewise nonlinear approximation)
₋ Lookup table (8-bit LUT)
₋ Separating different regions
• Only tanh() function
• Proposed Sigmoid Function
• 3 different regions
• Constant region
• Linear region
• Non-linear region
• 5-bit LUT & 3-bit LUT
• Evaluation
𝑀𝑆𝐸 =
𝑖=0
𝑁−1
𝑂𝑓𝑙𝑜𝑎𝑡𝑖𝑛𝑔_𝑝𝑜𝑖𝑛𝑡 − 𝑂 𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑
2
𝑁
MSE: Mean square error
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Forward ANN use Stochastic Computing
• Stochastic computing (SC)
– Process data in the form of digitized
probabilities
• Stochastic Number Generation (SNG)
• SC Multiplication
• SC Scale Addition
4/20/2018 6
Weighted input
• Additions
• Multiplications
Sigmoid
function
1
1 + 𝑒−𝑥
Logic gate,
MUX
Stochastic
Computing domain
𝑧=(𝑥+𝑦)/2
𝟐
𝟖 1,0,0,1,0,0,0,0
LFSR
>
𝐵𝑖𝑡 ‘1’ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 2/8
SNG
LFSR: Linear Feedback Shift Register[Lee2017eeh]
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
SNG
SNG
a
SNG
SNG
T Q
Q
SNG
T Q
Q
SNG
SNG
T Q
Q
MUX
MUX
MUX
2
1
w3
1
a2
2
w3
2
a2
3
w3
3
b3
z3
Forward ANN use Stochastic Computing (cont.)
4/20/2018 7
Conventional architecture
Proposed SC architecture
8-bits 10-bits
Latency 2 × 28
+ 2 2 × 210
+ 2
MSE 2.03× 10−4
7.70 × 10−5
Forward ANN Simulation with SC length
MSE: Mean square error
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Outline
• Context and Motivation
• Proposed design
• Evaluation
• Conclusion
4/20/2018 8
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Evaluation
4/20/2018 9
LSI
Contest
Our proposed
(8-bit SC length)
Frequency (MHz) 182 357
Slice LUTs 814 738
Slice Registers 528 450
Mux 60 75
DSP 12 0
Area (cell) 2059 1627 ( -20.98% )
Latency 4 2 × 28
+ 2
• Environment
• VHDL language
• Vivado Xilinx Tool
• Xilinx FPGA VC707 28nm
• 2-3-2 Forward Model
+ Low area cost
+ High frequency
+ Do not use DSP
- High latency
• Verification Implementation
2-3-2 forward ANN architecture evaluation
Real →
Fixed-point
Proposed
Design
Fixed-point
→ Real
𝑴𝑺𝑬 =
𝒊=𝟎
𝑵−𝟏
𝒂 − 𝒃 𝟐
𝑵
Math Equation
Input
(real value) MSE
MSE: Mean square error
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Conclusion
• Contest Requirement
– 2-3-2 ANN architecture with 5 sets of inputs
• Proposed Design
– Parameterized 2-3-2 ANN architecture
– Simplified Sigmoid architecture
• 21.18% area cost
– Forward architecture use Stochastic computing
• 20.98% area cost (compared to Contest reference model)
– Limitation: High latency
• Future work
– Apply Stochastic Computing for Backward module
– Reduce more area cost
4/20/2018 10
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Thank you for your
attention
4/20/2018 11
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
4/20/2018 12
Backup slide
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
Simplified Sigmoid Function Architecture
4/20/2018 13
1.0
Decode
0.015625
0.984375
0.0
Input
Output
+X
0.245 0.5
LUT
LUT
LUT
LUT
3
5
3
5
Input(0)
Input(1)
16
16
16
16
16
16
16
16
• Multiplexer divide and select the
regions
• 2 LUT 32-bit and 2 LUT 8-bit for
range −3.5; 1 and 1; −3.5
• Replacing multiplication by
approximated the 2-bit shift right
logic
Weighted
input range
Decode
Selector
Output
(𝟒. 𝟓; +∞) 000000 0.999023
(𝟑. 𝟓; 𝟒. 𝟓] 100000 0.984375
(𝟏; 𝟑. 𝟓] 110000 LUT
(−1; 1) 111000 𝑎|𝑎 ∈ 𝑦 = 0.245𝑥 + 0.5
[−3.5; −1) 111100 LUT
(−4.5; 3.5] 111110 0.015625
(−∞; −𝟒. 𝟓) 111111 0.0
ĐẠI
CÔNG
HỌC
NGHỆ
ĐẠI
CÔNG
HỌC
NGHỆ
2-3-2 Forward architecture use Stochastic Computing
4/20/2018 14
Sigmoid
functionSC weighted in
output
Sigmoid
function
LUT
Sigmoid
function
LUT
SC weighted in
hiddenSC weighted in
hiddenSC weighted in
hidden
clk
areset
start
input
SC weighted in
output
valid
output
weighted
in result
valid
output
weighted
in result
output
start start
weight hidden
bias hidden
weight hidden
bias hidden
start finish
Activation
function
activation
result
Activation
function
Controller
Weight
Bias
Weight
Bias
Weighted
input
Activation
Function
Weighted
Input
Activation
Function
...
...
...
...
...
...
...
...
...
...
.
.
.a
Our proposed
architecture
using SC
Reference
architecture
of LSI Contest

Contenu connexe

Similaire à Low area ANN architecture with Stochastic Computing and a Simplified Sigmoid Function

Krishnakumar signalling
Krishnakumar signallingKrishnakumar signalling
Krishnakumar signallingKrishna Kumar
 
Resume_Aney N Khatavkar
Resume_Aney N KhatavkarResume_Aney N Khatavkar
Resume_Aney N KhatavkarAney Khatavkar
 
Resume_Aney N Khatavkar
Resume_Aney N KhatavkarResume_Aney N Khatavkar
Resume_Aney N KhatavkarAney Khatavkar
 
TULIPP overview
TULIPP overviewTULIPP overview
TULIPP overviewTulipp. Eu
 
updated resume ---III
updated resume ---IIIupdated resume ---III
updated resume ---IIIshrutinalla
 
Chandan Kumar_3+_Years _EXP
Chandan Kumar_3+_Years _EXPChandan Kumar_3+_Years _EXP
Chandan Kumar_3+_Years _EXPChandan kumar
 
2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presenTakeshi Ohkawa
 
Fpga asic technologies_flow
Fpga asic technologies_flowFpga asic technologies_flow
Fpga asic technologies_flowravi4all
 
Educating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-VEducating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-VRISC-V International
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Design and implementation of 32 bit alu using verilog
Design and implementation of 32 bit alu using verilogDesign and implementation of 32 bit alu using verilog
Design and implementation of 32 bit alu using verilogSTEPHEN MOIRANGTHEM
 
Practical IEC 61850 for Substation Automation for Engineers & Technicians
Practical IEC 61850 for Substation Automation for Engineers & TechniciansPractical IEC 61850 for Substation Automation for Engineers & Technicians
Practical IEC 61850 for Substation Automation for Engineers & TechniciansLiving Online
 
Paper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturingPaper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturingYOU SHENG CHEN
 
lec01.pdf
lec01.pdflec01.pdf
lec01.pdfBeiYu6
 
Swetha Jayachandran resume
Swetha Jayachandran resumeSwetha Jayachandran resume
Swetha Jayachandran resumeswetha_chandran
 
Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015
Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015
Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015Indraneel Suryavanshi
 

Similaire à Low area ANN architecture with Stochastic Computing and a Simplified Sigmoid Function (20)

Krishnakumar signalling
Krishnakumar signallingKrishnakumar signalling
Krishnakumar signalling
 
Resume_Aney N Khatavkar
Resume_Aney N KhatavkarResume_Aney N Khatavkar
Resume_Aney N Khatavkar
 
Resume_Aney N Khatavkar
Resume_Aney N KhatavkarResume_Aney N Khatavkar
Resume_Aney N Khatavkar
 
resume_parbhat
resume_parbhatresume_parbhat
resume_parbhat
 
TULIPP overview
TULIPP overviewTULIPP overview
TULIPP overview
 
Ramesh resume
Ramesh resumeRamesh resume
Ramesh resume
 
updated resume ---III
updated resume ---IIIupdated resume ---III
updated resume ---III
 
Chandan Kumar_3+_Years _EXP
Chandan Kumar_3+_Years _EXPChandan Kumar_3+_Years _EXP
Chandan Kumar_3+_Years _EXP
 
Compe
CompeCompe
Compe
 
2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen
 
Fpga asic technologies_flow
Fpga asic technologies_flowFpga asic technologies_flow
Fpga asic technologies_flow
 
Educating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-VEducating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-V
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Design and implementation of 32 bit alu using verilog
Design and implementation of 32 bit alu using verilogDesign and implementation of 32 bit alu using verilog
Design and implementation of 32 bit alu using verilog
 
Practical IEC 61850 for Substation Automation for Engineers & Technicians
Practical IEC 61850 for Substation Automation for Engineers & TechniciansPractical IEC 61850 for Substation Automation for Engineers & Technicians
Practical IEC 61850 for Substation Automation for Engineers & Technicians
 
Paper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturingPaper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturing
 
lec01.pdf
lec01.pdflec01.pdf
lec01.pdf
 
Swetha Jayachandran resume
Swetha Jayachandran resumeSwetha Jayachandran resume
Swetha Jayachandran resume
 
cv
cvcv
cv
 
Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015
Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015
Resume_indraneel_VLSI_VIT_UNIVERSITY_13_Dec_2015
 

Low area ANN architecture with Stochastic Computing and a Simplified Sigmoid Function

  • 1. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ VIETNAM NATIONAL UNIVERSITY HANOI (VNU) VNU UNIVERSITY OF ENGINEERING AND TECHNOLOGY Low Area ANN Architecture with Stochastic Computing and a Simplified Sigmoid Function Huy-Hung Ho, Xuan-Thuan Nguyen, Van-Thuat Nguyen, Van-Dung Nguyen Key Laboratory for Smart Integrated Systems (SISLAB), VNU University of Engineering and Technology (VNU-UET)
  • 2. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ 4/20/2018 2 Big Systems IoT Devices Constraint • Small area • Low power • Lifecycle Reduce area cost Context and Motivation Backpropagation algorithm Hardware implementation Large models High complexity Training Time
  • 3. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Outline • Context and Motivation • Proposed design • Evaluation • Conclusion 4/20/2018 3
  • 4. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Proposed Design • Reference model of LSI Contest – Fixed size 2-3-2 architecture – Sigmoid function LUT: high area cost – Complex operators 4/20/2018 4 Weighted input • Additions • Multiplications Sigmoid function 1 1 + 𝑒−𝑥 • Proposed design – Parameterized ANN architecture (Backpropagation) – Optimized Sigmoid Function • Lower LUT memory – Forward ANN use Stochastic Computing • Replacing Addition, Multiplication by Logic gate Low area cost High frequency
  • 5. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Optimized Sigmoid function 4/20/2018 5 Conventional LUT sigmoid Proposed sigmoid MSE 3.12 × 10−7 1.79 × 10−5 Area (cell) 85 67 ( -21.18% ) • Conventional Method ₋ PWL (piecewise nonlinear approximation) ₋ Lookup table (8-bit LUT) ₋ Separating different regions • Only tanh() function • Proposed Sigmoid Function • 3 different regions • Constant region • Linear region • Non-linear region • 5-bit LUT & 3-bit LUT • Evaluation 𝑀𝑆𝐸 = 𝑖=0 𝑁−1 𝑂𝑓𝑙𝑜𝑎𝑡𝑖𝑛𝑔_𝑝𝑜𝑖𝑛𝑡 − 𝑂 𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑 2 𝑁 MSE: Mean square error
  • 6. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Forward ANN use Stochastic Computing • Stochastic computing (SC) – Process data in the form of digitized probabilities • Stochastic Number Generation (SNG) • SC Multiplication • SC Scale Addition 4/20/2018 6 Weighted input • Additions • Multiplications Sigmoid function 1 1 + 𝑒−𝑥 Logic gate, MUX Stochastic Computing domain 𝑧=(𝑥+𝑦)/2 𝟐 𝟖 1,0,0,1,0,0,0,0 LFSR > 𝐵𝑖𝑡 ‘1’ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 2/8 SNG LFSR: Linear Feedback Shift Register[Lee2017eeh]
  • 7. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ SNG SNG a SNG SNG T Q Q SNG T Q Q SNG SNG T Q Q MUX MUX MUX 2 1 w3 1 a2 2 w3 2 a2 3 w3 3 b3 z3 Forward ANN use Stochastic Computing (cont.) 4/20/2018 7 Conventional architecture Proposed SC architecture 8-bits 10-bits Latency 2 × 28 + 2 2 × 210 + 2 MSE 2.03× 10−4 7.70 × 10−5 Forward ANN Simulation with SC length MSE: Mean square error
  • 8. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Outline • Context and Motivation • Proposed design • Evaluation • Conclusion 4/20/2018 8
  • 9. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Evaluation 4/20/2018 9 LSI Contest Our proposed (8-bit SC length) Frequency (MHz) 182 357 Slice LUTs 814 738 Slice Registers 528 450 Mux 60 75 DSP 12 0 Area (cell) 2059 1627 ( -20.98% ) Latency 4 2 × 28 + 2 • Environment • VHDL language • Vivado Xilinx Tool • Xilinx FPGA VC707 28nm • 2-3-2 Forward Model + Low area cost + High frequency + Do not use DSP - High latency • Verification Implementation 2-3-2 forward ANN architecture evaluation Real → Fixed-point Proposed Design Fixed-point → Real 𝑴𝑺𝑬 = 𝒊=𝟎 𝑵−𝟏 𝒂 − 𝒃 𝟐 𝑵 Math Equation Input (real value) MSE MSE: Mean square error
  • 10. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Conclusion • Contest Requirement – 2-3-2 ANN architecture with 5 sets of inputs • Proposed Design – Parameterized 2-3-2 ANN architecture – Simplified Sigmoid architecture • 21.18% area cost – Forward architecture use Stochastic computing • 20.98% area cost (compared to Contest reference model) – Limitation: High latency • Future work – Apply Stochastic Computing for Backward module – Reduce more area cost 4/20/2018 10
  • 13. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ Simplified Sigmoid Function Architecture 4/20/2018 13 1.0 Decode 0.015625 0.984375 0.0 Input Output +X 0.245 0.5 LUT LUT LUT LUT 3 5 3 5 Input(0) Input(1) 16 16 16 16 16 16 16 16 • Multiplexer divide and select the regions • 2 LUT 32-bit and 2 LUT 8-bit for range −3.5; 1 and 1; −3.5 • Replacing multiplication by approximated the 2-bit shift right logic Weighted input range Decode Selector Output (𝟒. 𝟓; +∞) 000000 0.999023 (𝟑. 𝟓; 𝟒. 𝟓] 100000 0.984375 (𝟏; 𝟑. 𝟓] 110000 LUT (−1; 1) 111000 𝑎|𝑎 ∈ 𝑦 = 0.245𝑥 + 0.5 [−3.5; −1) 111100 LUT (−4.5; 3.5] 111110 0.015625 (−∞; −𝟒. 𝟓) 111111 0.0
  • 14. ĐẠI CÔNG HỌC NGHỆ ĐẠI CÔNG HỌC NGHỆ 2-3-2 Forward architecture use Stochastic Computing 4/20/2018 14 Sigmoid functionSC weighted in output Sigmoid function LUT Sigmoid function LUT SC weighted in hiddenSC weighted in hiddenSC weighted in hidden clk areset start input SC weighted in output valid output weighted in result valid output weighted in result output start start weight hidden bias hidden weight hidden bias hidden start finish Activation function activation result Activation function Controller Weight Bias Weight Bias Weighted input Activation Function Weighted Input Activation Function ... ... ... ... ... ... ... ... ... ... . . .a Our proposed architecture using SC Reference architecture of LSI Contest

Notes de l'éditeur

  1. LeNet (1998) AlexNet (2012) OverFeat (2013)  VGGNet (2014)  GoogleNet (2014)  ResNet (2015)
  2. High-cost area