Introduction to artificial neural network and deep learning
1. Introduction to Artificial Neural
Network and Deep Learning
Pramod R
Senior Lead Data Scientist
Fidelity
https://twitter.com/getpramodr
https://www.linkedin.com/in/pramod-r-05b38212/
https://github.com/meetpramodr/
2. Agenda
● What is a neural network
● Popular Use Cases/Examples of Neural Network
● Deep Dive:
○ Discuss Linear Separability/non Linear Separability
○ Components of neural network - input, output and hidden
○ What is inside the hidden layer - weight+bias and the activation
○ What is a Loss function
○ Gradient descent and back propagation with example
○ Activation function
● Hands on Keras:
○ Build a feed forward neural network for a classification problem
○ Measure the accuracy of the model
○ Save the model and tips for deployment
3. What is a Neural Network
● Ever wondered how our brain maps different
resolutions and styles of the number to the exact
number?
● Imagine writing a computer program to find the pixel
densities to classify a number based on their pattern
of arrangement? It would be a super non trivial task
4. Intro to Colab
Word of caution: If data is the new oil, then GPUs are truly its superbike engine
https://github.com/meetpramodr/Introduction-to-Artificial-Neural-Network-and-Deep-Learning
5. What is a Neural Network - contd..
NEURONS NETWORK
- Cells that does some
compute and holds numbers
- Also known as the
activations
- Connects multiple neurons
to form a network
- Transmits the stored
numbers through edges
A single neuron in the brain is an incredibly complex
machine that even today we don’t understand. A
single “neuron” in a neural network is an incredibly
simple mathematical function that captures a
minuscule fraction of the complexity of a biological
neuron. So to say neural networks mimic the brain,
that is true at the level of loose inspiration, but really
artificial neural networks are nothing like what the
biological brain does.
- Andrew NG
6. Popular Use Cases of Neural Network
SIRI
TESLA
CHATBOTS
BlackHole
AIRBnB
7. Linear vs Non-linear Separability
A set of points (belonging to say two different classes
- blue and red) are Linearly Separable if there
exists at least a single line with all the blue dots on
one side and red dots on the other. In a generalized -
higher dimension, we call this as a hyperplane
If the points cannot be separated by a single line
(without using kernels), then they cannot be
separated linearly
8. Components of a Neural Network
X1
X2
X3
Input layer
Hidden layer
Output layer
9. What is inside the hidden layer
Neurons
● A Neuron is the elementary part of the hidden layer
● Resembles Logistic Regression in its Primitive State
● Has 2 components;
○ Weight + Bias
○ Activation Function
X Y
WX+b Activatio
n
(sigmoid)
For example, to predict the likelihood of a person having a heart
ailment can be expressed as -
No. of cigarettes he
smokes per day * W + b Sigmoid
Predicted: Probability
that the person has a
heart ailment
10. Estimating the Weights - Loss Function
Loss Function
Quantifies our unhappiness with our prediction on the training set
LOSS ACTUALS PREDICTED
The objective is to minimize this loss function by choosing the optimal values of W (and b)
Estimating W (and b) to attain minimum Loss:
(Because Loss can be controlled by the weights and not by the data)
● Localized Random Search
● Gradient Descent
12. Estimating the Weights - Gradient Descent with an Example
age bp
39 144
47 220
45 138
47 145
65 162
46 142
67 170
● Initialize a line with slope 0 (or some random value)
● Calculate the residuals (or the loss - our unhappiness with the model) and plot it in a separate
graph
Loss
Iterations
● Calculate the rate of change of y with respect to x (slope = Δy/Δx) which is currently set to 0, but
has to be a certainly something greater than 0
● Now, alter the line using the new slope, and recompute the residuals/losses until we reach the
minimal loss. The corresponding slope of the line would be our model weight.
13. Examples
● Recognizing images
● Understanding speech
● Translate speech
● Grasp objects
● Avoid an obstacle
● Tagging a text/image
Counter Examples
● Get sarcasm/irony
● Decisions/Judgements where
no data/model is available (gut
feel)
● Generalize to new scenarios
● Judge a character of a person?
14. Back Propagation and Chain Rule
Chain Rule:
Toy example - Simple computational
graph
Quick differentiation refresher
f(x,y,z) = (x+y)*z
Compute: ∂f/∂x, ∂f/∂y, ∂f/∂z
Given - x=-2, y=5, z=-4
q=3
f=-12
+ *
x=-2
y=5
z=-4
Forward Pass
q=(x+y)
f=q*z
∂q/∂x=1;∂q/∂y=1
∂f/∂q=z;∂f/∂z=q
+ *
x=-2
y=5
q=3
f=-12
Back Propagate
z=-4
Chain Rule
15. Activation Functions
Activation Functions add non linearity in the models, by
allowing/disallowing signals beyond a certain threshold.
Example: Sigmoid takes any number and ‘squashes’ them between 0 and 1
Tanh