SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Designing Your Neural Networks:
A Step by Step Walkthrough
Lavanya Shukla
What's
on the
menu
today?
How do Neural Networks
learn?
Take a whirlwind tour of
Neural Network architectures
Train Neural Networks
Optimize Neural Network to
achieve SOTA performance
Weights & Biases
How You Can Train Your Own
Neural Nets
The Codebit.ly/keras-neural-nets
The Goal For Today + Code
Basic Neural Network
Architecture
The Perceptron
Neurons output the weighted sum of
their inputs
How A Neural Network Learns
This is the number of features your neural
network uses to make its predictions.
The input vector needs one input neuron per
feature.
You want to carefully select these features
and remove any that may contain patterns
that won’t generalize beyond the training set
(and cause overfitting).
For images, this is the dimensions of your
image (28*28=784 in case of MNIST).
Input neurons
How A Neural Network
Sees An Image
For images, the number of input neurons is
the dimensions of your image (28*28=784 in
case of MNIST).
How A Neural Network Sees
An Image
The convolution layer is made up of a set of
independent filters.
Each filter slides over the image and creates
feature maps that learn different aspects of
an image.
Convolutions
The pooling layer reduce the size of the
image representation and so the number of
parameters and computation in the
network.
Pooling
How The Network Sees An Image
Basic Neural Network
Architecture
Let's get back to talking about the
This is the number of predictions you want to
make.
Regression
one neuron per predicted value
Classification
For binary classification (spam-not spam),
one output neuron per positive class.
For multi-class classification (car, dog,
house), one output neuron per class, and
use the softmax activation function on the
output layer to ensure the final
probabilities sum to 1.
Output neurons
The number of hidden layers is highly
dependent on the problem and the
architecture of your neural network.
You’re essentially trying to Goldilocks your
way into the perfect neural network
architecture – not too big, not too small, just
right.
Generally, 1-5 hidden layers will serve you
well for most problems.
Hidden Layers
In general using the same number of
neurons for all hidden layers will suffice.
For some datasets, having a large first layer
followed by smaller layers will lead to
better performance as the first layer can learn
a lot of lower-level features that can feed
into a few higher order features in the
subsequent layers.
Hidden Layers - Tips
Usually you will get more of a performance
boost from adding more layers than adding
more neurons in each layer.
I’d recommend starting with 1-5 layers and 1-
100 neurons and slowly adding more layers
and neurons until you start overfitting.
Hidden Layers - Tips
If number of layers/neurons is too small,
your network will not be able to learn the
underlying patterns in your data and be
useless.
An approach to counteract this is to start
with a huge number of hidden layers +
hidden neurons and then use dropout and
early stopping to let the neural network size
itself down for you.
Hidden Layers - Overfit First
When working with image or speech data,
you’d want your network to have dozens-
hundreds of layers, not all of which might be
fully connected.
For these use cases, there are pre-trained
models (YOLO, ResNet, VGG) that allow you
to use large parts of their networks, and train
your model on top of these networks to learn
only the higher order features.
In this case, your model will still have only a
few layers to train.
Hidden Layers - Images
Batch Size: Total number of training
examples present in a single batch.
Large batch sizes can be great because they
can harness the power of GPUs to process
more training instances per time.
Small batch sizes' performance generalizes
better and are less memory intensive.
If you’re not operating at massive scales, I
would recommend starting with lower
batch sizes and slowly increasing the size
and monitoring performance.
Between 32 and 64.
Batch Size
Epochs: number of times your network sees
your data.
One epoch is when an entire dataset is
passed both forward and backward through
the neural network only once.
I’d recommend starting with a large number
of epochs and use Early Stopping to halt
training when performance stops improving.
Number of Epochs
Make sure all your features have similar scale
before using them as inputs to your neural
network.
This ensures faster convergence.
When your features have different scales (e.g.
salaries in thousands and years of experience
in tens), the cost function will look like the
elongated bowl on the left.
This means your optimization algorithm will
take a long time to traverse the valley
compared to using normalized features (on
the right).
Scaling your features
Activation Functions
Decides if a neuron fires or not.
In general, the performance from using
different activation functions improves in this
order (from lowest→highest performing):
logistic → tanh → ReLU → Leaky ReLU → ELU
→ SELU.
to combat neural network overfitting: RReLU
reduce latency at runtime: leaky ReLU
for massive training sets: PReLU
for fast inference times: leaky ReLU
if your network doesn’t self-normalize: ELU
for an overall robust activation function: SELU
Activation Functions
Loss Functions
Regression
Mean squared error is the most common
loss function to optimize for, unless there
are a significant number of outliers.
When you have a lot of outliers, use mean
absolute error.
Classification
Cross-entropy will serve you well in most
cases. It maximize the likelihood of
classifying the input data correctly.
Loss Functions
Learning Rate
The amount by which the weights are updated
during training.
Picking the learning rate is very important,
and you want to make sure you get this right!
To find the best learning rate:
start with a very low values (10^-6)
and slowly multiply it by a constant until
it reaches a very high value (e.g. 10).
Measure your model performance vs each
learning rate and use W&B to pick the best
one.
Learning Rate
Weight Initialization
The right weight initialization method can
speed up time-to-convergence considerably.
The choice of your initialization method
depends on your activation function.
When using ReLU or leaky RELU,
use He initialization
When using SELU or ELU,
use LeCun initialization
When using softmax, logistic, or tanh,
use Glorot initialization
Weight Initialization
Early Stopping
Early Stopping lets you live it up by training
a model with more hidden layers, hidden
neurons and for more epochs than you
need.
Then just stopping training when
performance stops improving
consecutively for n epochs.
It saves the best performing model for you.
You can enable Early Stopping by setting
up a callback when you fit your model and
setting save_best_only=True.
Early Stopping
Dropout
gives you a massive performance boost
(~2% for SOTA models)
Very simple: randomly turns off a
percentage of neurons at each layer, at
each training step.
More robust network because it can’t rely
on any particular set of input neurons for
making predictions.
Around 2^n (where n is the number of
neurons in the architecture) slightly-
unique neural networks are generated
during the training process, and ensembled
together to make predictions.
Dropout
A good dropout rate is:
between 0.1 to 0.5; 0.3 for RNNs
0.5 for CNNs.
Use larger rates for bigger layers.
Increasing the dropout rate decreases
overfitting, and decreasing the rate is
helpful to combat under-fitting.
You definitely don’t want to use dropout in
the output layers.
AlphaDropout works well with SELU
activation functions by preserving the
input’s mean and standard deviations.
Dropout
Optimizers
Use Stochastic Gradient Descent if you
care deeply about quality of convergence
and if time is not of the essence.
If you care about time-to-convergence
and a point close to optimal
convergence will suffice, experiment with
Adam, Nadam, RMSProp, and Adamax
optimizers.
Adam/Nadam are good starting points, and
tend to be quite forgiving to a bad
learning late and other non-optimal
hyperparameters.
Optimizers
Vanishing & Exploding
Gradients
Just like people, not all neural network
layers learn at the same speed.
When the backprop algorithm
propagates the error gradient from the
output layer to the first layers, the
gradients get smaller and smaller until
they’re almost negligible when they reach
the first layers.
This means the weights of the first layers
aren’t updated significantly at each
step.
This is the problem of vanishing gradients.
Vanishing & Exploding
Gradients
Experiment with:
Weight Initialization Method
Activation Function
Gradient Clipping:
clip them when they exceed a certain
value
Early Stopping
Dropout
Optimizer - Adam, nAdam
BatchNorm
Learning Rate Scheduling
Vanishing & Exploding
Gradients
Now it's your turn!bit.ly/keras-neural-nets
Thank you!
lavanyaai
lavanya.ai
Twitter.com/
http://

Contenu connexe

Similaire à Designing your neural networks – a step by step walkthrough

AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdfgnans Kgnanshek
 
neural network
neural networkneural network
neural networkSTUDENT
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksNAVER Engineering
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfsudheeremoa229
 
Ann model and its application
Ann model and its applicationAnn model and its application
Ann model and its applicationmilan107
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Amazon Web Services
 

Similaire à Designing your neural networks – a step by step walkthrough (20)

ANN.ppt[1].pptx
ANN.ppt[1].pptxANN.ppt[1].pptx
ANN.ppt[1].pptx
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf
 
neural network
neural networkneural network
neural network
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable Networks
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Deep learning
Deep learningDeep learning
Deep learning
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
 
Regularizing DNN.pptx
Regularizing DNN.pptxRegularizing DNN.pptx
Regularizing DNN.pptx
 
Ann model and its application
Ann model and its applicationAnn model and its application
Ann model and its application
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 

Dernier

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Designing your neural networks – a step by step walkthrough

  • 1. Designing Your Neural Networks: A Step by Step Walkthrough Lavanya Shukla
  • 2. What's on the menu today? How do Neural Networks learn? Take a whirlwind tour of Neural Network architectures Train Neural Networks Optimize Neural Network to achieve SOTA performance Weights & Biases How You Can Train Your Own Neural Nets
  • 4. The Goal For Today + Code
  • 6. The Perceptron Neurons output the weighted sum of their inputs
  • 7. How A Neural Network Learns
  • 8. This is the number of features your neural network uses to make its predictions. The input vector needs one input neuron per feature. You want to carefully select these features and remove any that may contain patterns that won’t generalize beyond the training set (and cause overfitting). For images, this is the dimensions of your image (28*28=784 in case of MNIST). Input neurons
  • 9. How A Neural Network Sees An Image
  • 10. For images, the number of input neurons is the dimensions of your image (28*28=784 in case of MNIST). How A Neural Network Sees An Image
  • 11. The convolution layer is made up of a set of independent filters. Each filter slides over the image and creates feature maps that learn different aspects of an image. Convolutions
  • 12. The pooling layer reduce the size of the image representation and so the number of parameters and computation in the network. Pooling
  • 13. How The Network Sees An Image
  • 14. Basic Neural Network Architecture Let's get back to talking about the
  • 15. This is the number of predictions you want to make. Regression one neuron per predicted value Classification For binary classification (spam-not spam), one output neuron per positive class. For multi-class classification (car, dog, house), one output neuron per class, and use the softmax activation function on the output layer to ensure the final probabilities sum to 1. Output neurons
  • 16. The number of hidden layers is highly dependent on the problem and the architecture of your neural network. You’re essentially trying to Goldilocks your way into the perfect neural network architecture – not too big, not too small, just right. Generally, 1-5 hidden layers will serve you well for most problems. Hidden Layers
  • 17. In general using the same number of neurons for all hidden layers will suffice. For some datasets, having a large first layer followed by smaller layers will lead to better performance as the first layer can learn a lot of lower-level features that can feed into a few higher order features in the subsequent layers. Hidden Layers - Tips
  • 18. Usually you will get more of a performance boost from adding more layers than adding more neurons in each layer. I’d recommend starting with 1-5 layers and 1- 100 neurons and slowly adding more layers and neurons until you start overfitting. Hidden Layers - Tips
  • 19. If number of layers/neurons is too small, your network will not be able to learn the underlying patterns in your data and be useless. An approach to counteract this is to start with a huge number of hidden layers + hidden neurons and then use dropout and early stopping to let the neural network size itself down for you. Hidden Layers - Overfit First
  • 20. When working with image or speech data, you’d want your network to have dozens- hundreds of layers, not all of which might be fully connected. For these use cases, there are pre-trained models (YOLO, ResNet, VGG) that allow you to use large parts of their networks, and train your model on top of these networks to learn only the higher order features. In this case, your model will still have only a few layers to train. Hidden Layers - Images
  • 21. Batch Size: Total number of training examples present in a single batch. Large batch sizes can be great because they can harness the power of GPUs to process more training instances per time. Small batch sizes' performance generalizes better and are less memory intensive. If you’re not operating at massive scales, I would recommend starting with lower batch sizes and slowly increasing the size and monitoring performance. Between 32 and 64. Batch Size
  • 22. Epochs: number of times your network sees your data. One epoch is when an entire dataset is passed both forward and backward through the neural network only once. I’d recommend starting with a large number of epochs and use Early Stopping to halt training when performance stops improving. Number of Epochs
  • 23. Make sure all your features have similar scale before using them as inputs to your neural network. This ensures faster convergence. When your features have different scales (e.g. salaries in thousands and years of experience in tens), the cost function will look like the elongated bowl on the left. This means your optimization algorithm will take a long time to traverse the valley compared to using normalized features (on the right). Scaling your features
  • 25. Decides if a neuron fires or not. In general, the performance from using different activation functions improves in this order (from lowest→highest performing): logistic → tanh → ReLU → Leaky ReLU → ELU → SELU. to combat neural network overfitting: RReLU reduce latency at runtime: leaky ReLU for massive training sets: PReLU for fast inference times: leaky ReLU if your network doesn’t self-normalize: ELU for an overall robust activation function: SELU Activation Functions
  • 27. Regression Mean squared error is the most common loss function to optimize for, unless there are a significant number of outliers. When you have a lot of outliers, use mean absolute error. Classification Cross-entropy will serve you well in most cases. It maximize the likelihood of classifying the input data correctly. Loss Functions
  • 29. The amount by which the weights are updated during training. Picking the learning rate is very important, and you want to make sure you get this right! To find the best learning rate: start with a very low values (10^-6) and slowly multiply it by a constant until it reaches a very high value (e.g. 10). Measure your model performance vs each learning rate and use W&B to pick the best one. Learning Rate
  • 31. The right weight initialization method can speed up time-to-convergence considerably. The choice of your initialization method depends on your activation function. When using ReLU or leaky RELU, use He initialization When using SELU or ELU, use LeCun initialization When using softmax, logistic, or tanh, use Glorot initialization Weight Initialization
  • 33. Early Stopping lets you live it up by training a model with more hidden layers, hidden neurons and for more epochs than you need. Then just stopping training when performance stops improving consecutively for n epochs. It saves the best performing model for you. You can enable Early Stopping by setting up a callback when you fit your model and setting save_best_only=True. Early Stopping
  • 35. gives you a massive performance boost (~2% for SOTA models) Very simple: randomly turns off a percentage of neurons at each layer, at each training step. More robust network because it can’t rely on any particular set of input neurons for making predictions. Around 2^n (where n is the number of neurons in the architecture) slightly- unique neural networks are generated during the training process, and ensembled together to make predictions. Dropout
  • 36. A good dropout rate is: between 0.1 to 0.5; 0.3 for RNNs 0.5 for CNNs. Use larger rates for bigger layers. Increasing the dropout rate decreases overfitting, and decreasing the rate is helpful to combat under-fitting. You definitely don’t want to use dropout in the output layers. AlphaDropout works well with SELU activation functions by preserving the input’s mean and standard deviations. Dropout
  • 38. Use Stochastic Gradient Descent if you care deeply about quality of convergence and if time is not of the essence. If you care about time-to-convergence and a point close to optimal convergence will suffice, experiment with Adam, Nadam, RMSProp, and Adamax optimizers. Adam/Nadam are good starting points, and tend to be quite forgiving to a bad learning late and other non-optimal hyperparameters. Optimizers
  • 40. Just like people, not all neural network layers learn at the same speed. When the backprop algorithm propagates the error gradient from the output layer to the first layers, the gradients get smaller and smaller until they’re almost negligible when they reach the first layers. This means the weights of the first layers aren’t updated significantly at each step. This is the problem of vanishing gradients. Vanishing & Exploding Gradients
  • 41. Experiment with: Weight Initialization Method Activation Function Gradient Clipping: clip them when they exceed a certain value Early Stopping Dropout Optimizer - Adam, nAdam BatchNorm Learning Rate Scheduling Vanishing & Exploding Gradients
  • 42. Now it's your turn!bit.ly/keras-neural-nets