Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep learning
A primer for the curious developer

Uwe Friedrichsen & Dr. Shirin Glander –codecentric AG – 2018
@ufried

Uwe Friedrichsen

uwe.friedrichsen@codecentric.de
@ShirinGlander

Dr. Shirin Glander

shirin.glander@codecentric....
Why should I care about Deep Learning?
Deep learning has the potential to affect white collar workers (including IT)
in a similar way as robots affected blue col...
What is Deep Learning?

Some success stories
What is Deep Learning?

A rough classification
AI

Artificial
Intelligence
ML

Machine
Learning
RL

Representational
Learning
DL

Deep
Learning
Traditional AI




Focus on problems that are ...
•  ... hard for humans
•  ... straightforward for computers
•  ... can b...
Where does Deep Learning come from?
General evolution


•  Two opposed forces
•  Recreation of biological neural processing
•  Abstract mathematical models (m...
Cybernetics (ca. 1940 - 1960)

•  ADALINE, Perceptron
•  Linear models, typically no hidden layers
•  Stochastic Gradient ...
Connectionism (ca. 1980 - 1990)

•  Neocognitron
•  Non-linear models, distributed feature representation
•  Backpropagati...
Deep Learning (ca. 2006 -)


•  Improved algorithms, advanced computing power
•  Enabled training much larger and deeper n...
What is Deep Learning used for?
Deep Learning application areas




•  Classification (incl. missing inputs)
•  Regression (value prediction)
•  Function ...
How does Deep Learning work?

A first (scientifically inspired) approach
„A computer program is said to learn
•  from experience E
•  with respect to some class of tasks T
•  and performance meas...
Err ...
Hmm ...
Well ...
I don’t get it!
How does Deep Learning work?

A second (more down-to-earth) approach
Operating
principle
Training
Network
types
Deep
Learning
Deep
Learning
Operating
principle
Training
Network
types
Structure
Behavior
Weight
Operation
Neuron
Data
CNN
Types
Challen...
Deep
Learning
Operating
principle
Training
Network
types
Structure
Behavior
Operating
principle
Structure
Behavior
Operating
principle
Operating
principle
Structure
Behavior
Neuron
Neuron

•  Design inspired by biological neurons
•  One or more inputs
•  Processing (and state storage) unit
•  One or mo...
Layer
Operating
principle
Structure
Behavior
Neuron
Layer

•  Neurons typically organized in layers
•  Input and output layer as default
•  Optionally one or more hidden laye...
Connection
Operating
principle
Structure
Behavior
Neuron
Layer
Connection

•  Usually connect input and output tensor in a 1:1 manner
•  Connect between layers (output layer N-1 à input...
Weight
Operating
principle
Structure
Behavior
Neuron
Layer
Connection
Weight

•  (Logically) augments a connection
•  Used to amplify or dampen a signal sent over a connection
•  The actual “m...
Operation
Operating
principle
Structure
Behavior
Neuron
Layer
Weight
Connection
Input tensor(s)
Output tensor(s)
Step 1

•  For each neuron of input layer
•  Copy resp. input tensor’s value to neuron’s ...
Input tensor(s)
Output tensor(s)
Step 1
Final step
Step 2-N
•  Default update procedure (most widespread)
•  All neuron pe...
Activation
function
Operating
principle
Structure
Behavior
Neuron
Layer
Weight
Connection
Operation
Linear function

•  Easy to handle
•  Cannot handle
non-linear problems
Logistic sigmoid function

•  Very widespread
•  Delimits output to [0, 1]
•  Vanishing gradient
problem
Hyperbolic tangent

•  Very widespread
•  Delimits output to [-1, 1]
•  Vanishing gradient
problem
Rectified linear unit (ReLU)

•  Easy to handle
•  No derivative in 0
•  Dying ReLU problem
•  Can be mitigated, e.g.,
by ...
Softplus

•  Smooth approximation
of ReLU
•  ReLU usually performs
better
•  Thus, use of softplus
usually discouraged
Hyper-
parameter
Operating
principle
Structure
Behavior
Neuron
Layer
Weight
Connection
Operation
Activation
function
Hyperparameter


•  Influence network and algorithm behavior
•  Often influence model capacity
•  Not learned, but usually...
Training
Deep
Learning
Operating
principle
Network
types
Structure
Behavior
Weight
Operation
Neuron
Layer
Connection
Hyper...
Quality
measure
Training
Cost function
Training
Quality
measure
Cost function (a.k.a. loss function)

•  Determines distance from optimal performance
•  Mean squared error as simple (and...
Cost function (a.k.a. loss function)

•  Determines distance from optimal performance
•  Mean squared error as simple (and...
Optimization
procedure
Training
Quality
measure
Cost function
Training
Quality
measure
Stochastic
gradient
descent
Cost function
Optimization
procedure
Stochastic gradient descent

•  Direct calculation of minimum often not feasible
•  Instead stepwise “descent” using the g...
Stochastic gradient descent

•  Direct calculation of minimum often not feasible
•  Instead stepwise “descent” using the g...
Stochastic gradient descent
Gradient
Direction
Steepness
x
Stochastic gradient descent
x
ε * gradient
x’
Learning
rate ε
Training
Quality
measure
Stochastic
gradient
descent
 Back-
propagation
Cost function
Optimization
procedure
Backpropagation

•  Procedure to calculate new weights based on loss function
Depends on
cost function
Depends on
activati...
Backpropagation

•  Procedure to calculate new weights based on loss function
•  Usually “back-propagated” layer-wise
•  M...
Data
Training
Quality
measure
Stochastic
gradient
descent
 Back-
propagation
Cost function
Optimization
procedure
Training set
Validation/
Test set
Data
Training
Quality
measure
Stochastic
gradient
descent
 Back-
propagation
Cost functi...
Data set

•  Consists of examples (a.k.a. data points)
•  Example always contains input tensor
•  Sometimes also contains ...
Data
Types
Training
Quality
measure
Stochastic
gradient
descent
 Back-
propagation
Training set
Validation/
Test set
Cost ...
Supervised
Data
Types
Training
Quality
measure
Stochastic
gradient
descent
 Back-
propagation
Training set
Validation/
Tes...
Supervised learning

•  Typically learns from a large, yet finite set of examples
•  Examples consist of input and output ...
Unsupervised
Data
Types
Supervised
Training
Quality
measure
Stochastic
gradient
descent
 Back-
propagation
Training set
Va...
Unsupervised learning

•  Typically learns from a large, yet finite set of examples
•  Examples consist of input tensor on...
Reinforcement
Data
Types
Supervised
Training
Quality
measure
Unsupervised
Stochastic
gradient
descent
 Back-
propagation
T...
Reinforcement learning

•  Continuously optimizes interaction with an environment
based on reward-based learning
Agent
Env...
Reinforcement learning

•  Continuously optimizes interaction with an environment
based on reward-based learning
•  Goal i...
Challenges
Data
Types
Supervised
Training
Quality
measure
Unsupervised
Reinforcement
Stochastic
gradient
descent
 Back-
pr...
Data
Types
Supervised
Training
Quality
measure
Unsupervised
Reinforcement
Stochastic
gradient
descent
 Back-
propagation
U...
Underfitting and Overfitting

•  Training error describes how good training data is learnt
•  Test error is an indicator f...
Good fit
Underfitting
 Overfitting
Training data
 Test data
Underfitting and Overfitting



•  Under- and overfitting influenced by model capacity
•  Too low capacity usually leads t...
Data
Types
Supervised
Training
Quality
measure
Unsupervised
Reinforcement
Stochastic
gradient
descent
 Back-
propagation
U...
Regularization

•  Regularization is a modification applied to learning algorithm
•  to reduce the generalization error
• ...
Data
Types
Supervised
Training
Quality
measure
Unsupervised
Reinforcement
Stochastic
gradient
descent
 Back-
propagation
U...
Transfer learning


•  How to transfer insights between related tasks
•  E.g., is it possible to transfer knowledge gained...
Network
types
Deep
Learning
Operating
principle
Training
Structure
Behavior
Weight
Operation
Neuron
Data
Types
Challenges
...
MLP

Multilayer
Perceptron
Network
types
Multilayer perceptron (MLP)

•  Multilayer feed-forward networks
•  “Vanilla” neural networks
•  Typically used for
•  Fun...
CNN

Convolutional
Neural Network
Network
types
MLP

Multilayer
Perceptron
Convolutional neural network (CNN)

•  Special type of MLP for image processing
•  Connects convolutional neuron only with...
RNN

Recurrent
Neural Network
Network
types
MLP

Multilayer
Perceptron
CNN

Convolutional
Neural Network
Recurrent neural network (RNN)

•  Implements internal feedback loops
•  Provides a temporal memory
•  Typically used for
...
LSTM

Long
Short-Term
Memory
Network
types
MLP

Multilayer
Perceptron
CNN

Convolutional
Neural Network
RNN

Recurrent
Neu...
Long short-term memory (LSTM)

•  Special type of RNN
•  Uses special LSTM units
•  Can implement very long-term memory
wh...
Auto-
encoder
Network
types
MLP

Multilayer
Perceptron
CNN

Convolutional
Neural Network
RNN

Recurrent
Neural Network
LST...
Autoencoder
•  Special type of MLP
•  Reproduces input at output layer
•  Consists of encoder and decoder
•  Usually confi...
GAN

Generative
Adversarial
Networks
Network
types
MLP

Multilayer
Perceptron
CNN

Convolutional
Neural Network
RNN

Recur...
Generative adversarial networks (GAN)
•  Consists of two (adversarial) networks
•  Generator creating fake images
•  Discr...
Deep
Learning
Operating
principle
Training
Network
types
Structure
Behavior
Weight
Operation
Neuron
Data
CNN
Types
Challen...
How does Deep Learning feel in practice?
What issues might I face if diving deeper?
Issues you might face

•  Very fast moving research domain
•  You need the math. Really!
•  How much data do you have?
•  ...
Wrap-up
Wrap-up

•  Broad, diverse topic
•  Very good library support and more
•  Very active research topic
•  No free lunch
•  Y...
References

•  I. Goodfellow, Y. Bengio, A. Courville, ”Deep learning",
MIT press, 2016, also available via https://www.de...
@ShirinGlander

Dr. Shirin Glander

shirin.glander@codecentric.de
@ufried

Uwe Friedrichsen

uwe.friedrichsen@codecentric....
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
Deep learning - a primer
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Deep learning - a primer

Download to read offline

This is a slide deck from a presentation, that my colleague Shirin Glander (https://www.slideshare.net/ShirinGlander/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :)

For the sake of simplicity and completeness, I just copied the two slide decks together. As I did the "surrounding" part, I added Shirin's part at the place when she took over and then added my concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;)

The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts.

The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand.

After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start.

The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it.

As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Deep learning - a primer

  1. 1. Deep learning A primer for the curious developer Uwe Friedrichsen & Dr. Shirin Glander –codecentric AG – 2018
  2. 2. @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de @ShirinGlander Dr. Shirin Glander shirin.glander@codecentric.de
  3. 3. Why should I care about Deep Learning?
  4. 4. Deep learning has the potential to affect white collar workers (including IT) in a similar way as robots affected blue collar workers.
  5. 5. What is Deep Learning? Some success stories
  6. 6. What is Deep Learning? A rough classification
  7. 7. AI Artificial Intelligence ML Machine Learning RL Representational Learning DL Deep Learning
  8. 8. Traditional AI Focus on problems that are ... •  ... hard for humans •  ... straightforward for computers •  ... can be formally described Deep Learning Focus on problems that are ... •  ... intuitive for humans •  ... difficult for computers (hard to be described formally) •  ... best learnt from experience
  9. 9. Where does Deep Learning come from?
  10. 10. General evolution •  Two opposed forces •  Recreation of biological neural processing •  Abstract mathematical models (mostly linear algebra) •  Results in different models and algorithms •  No clear winner yet
  11. 11. Cybernetics (ca. 1940 - 1960) •  ADALINE, Perceptron •  Linear models, typically no hidden layers •  Stochastic Gradient Descent (SGD) •  Limited applicability •  E.g., ADALINE could not learn XOR •  Resulted in “First winter of ANN” (Artificial Neural Networks)
  12. 12. Connectionism (ca. 1980 - 1990) •  Neocognitron •  Non-linear models, distributed feature representation •  Backpropagation •  Typically 1, rarely more hidden layers •  First approaches of sequence modeling •  LSTM (Long short-term memory) in 1997 •  Unrealistic expectations nurtured by ventures •  Resulted in “Second winter of ANN”
  13. 13. Deep Learning (ca. 2006 -) •  Improved algorithms, advanced computing power •  Enabled training much larger and deeper networks •  Enabled training much larger data sets •  Typically several to many hidden layers •  Overcame the “feature extraction dilemma”
  14. 14. What is Deep Learning used for?
  15. 15. Deep Learning application areas •  Classification (incl. missing inputs) •  Regression (value prediction) •  Function prediction •  Density estimation •  Structured output (e.g., translation) •  Anomaly detection •  Synthesis and sampling •  Denoising •  Compression (dimension reduction) •  ...
  16. 16. How does Deep Learning work? A first (scientifically inspired) approach
  17. 17. „A computer program is said to learn •  from experience E •  with respect to some class of tasks T •  and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” -- T. Mitchell, Machine Learning, p. 2, McGraw Hill (1997) Supervised learning, unsupervised learning, reinforcement learning, ... Too difficult to solve with fixed programs designed by humans Accuracy vs. error rate, training vs. test set, ...
  18. 18. Err ... Hmm ... Well ... I don’t get it!
  19. 19. How does Deep Learning work? A second (more down-to-earth) approach
  20. 20. Operating principle Training Network types Deep Learning
  21. 21. Deep Learning Operating principle Training Network types Structure Behavior Weight Operation Neuron Data CNN Types Challenges Quality measure RNN LSTM Auto- encoder GAN MLP Training set Cost function Transfer learning Regulari- zation Layer Connection Hyper- parameter Activation function Reinforce- ment Unsuper- vised Supervised Stochastic gradient descent Back- propagation Under-/ Overfitting Validation/ Test set Optimization procedure
  22. 22. Deep Learning Operating principle Training Network types
  23. 23. Structure Behavior Operating principle
  24. 24. Structure Behavior Operating principle
  25. 25. Operating principle Structure Behavior Neuron
  26. 26. Neuron •  Design inspired by biological neurons •  One or more inputs •  Processing (and state storage) unit •  One or more outputs •  In practice often implemented as tensor transformations •  Relevance of internal state depends on network type •  Usually negligible for feed-forward networks •  Usually relevant for recurrent networks Neuron Processing (+ State) Output(s) Input(s) ... ...
  27. 27. Layer Operating principle Structure Behavior Neuron
  28. 28. Layer •  Neurons typically organized in layers •  Input and output layer as default •  Optionally one or more hidden layer •  Layer layout can have 1-n dimensions •  Neurons in different layers can have different properties •  Different layers responsible for different (sub-)tasks Output layer Input layer ... N 1 2 Hidden layer(s) ...
  29. 29. Connection Operating principle Structure Behavior Neuron Layer
  30. 30. Connection •  Usually connect input and output tensor in a 1:1 manner •  Connect between layers (output layer N-1 à input layer N) •  Layers can be fully or partially (sparsely) connected •  RNNs also have backward and/or self connections •  Some networks have connections between neurons of the same layer (e.g., Hopfield nets, Boltzmann machines) Input tensor(s) Output tensor(s)
  31. 31. Weight Operating principle Structure Behavior Neuron Layer Connection
  32. 32. Weight •  (Logically) augments a connection •  Used to amplify or dampen a signal sent over a connection •  The actual “memory” of the network •  The “right” values of the weights are learned during training •  Can also be used to introduce a bias for a neuron •  By connecting it to an extra neuron that constantly emits 1 W Weight
  33. 33. Operation Operating principle Structure Behavior Neuron Layer Weight Connection
  34. 34. Input tensor(s) Output tensor(s) Step 1 •  For each neuron of input layer •  Copy resp. input tensor’s value to neuron’s input •  Calculate state/output using activation function (typically linear function, passing value through) Step 2-N •  For each hidden layer and output layer in their order •  For each neuron of the layer •  Calculate weighted sum on inputs •  Calculate state/output using activation function (see examples later) Final step •  For each neuron of output layer •  Copy neuron’s output to resp. output tensor’s value
  35. 35. Input tensor(s) Output tensor(s) Step 1 Final step Step 2-N •  Default update procedure (most widespread) •  All neuron per layer in parallel •  Different update procedures exist •  E.g., some Hopfield net implementations randomly pick neurons for update
  36. 36. Activation function Operating principle Structure Behavior Neuron Layer Weight Connection Operation
  37. 37. Linear function •  Easy to handle •  Cannot handle non-linear problems
  38. 38. Logistic sigmoid function •  Very widespread •  Delimits output to [0, 1] •  Vanishing gradient problem
  39. 39. Hyperbolic tangent •  Very widespread •  Delimits output to [-1, 1] •  Vanishing gradient problem
  40. 40. Rectified linear unit (ReLU) •  Easy to handle •  No derivative in 0 •  Dying ReLU problem •  Can be mitigated, e.g., by using leaky ReLU
  41. 41. Softplus •  Smooth approximation of ReLU •  ReLU usually performs better •  Thus, use of softplus usually discouraged
  42. 42. Hyper- parameter Operating principle Structure Behavior Neuron Layer Weight Connection Operation Activation function
  43. 43. Hyperparameter •  Influence network and algorithm behavior •  Often influence model capacity •  Not learned, but usually manually optimized •  Currently quite some research interest in automatic hyperparameter optimization Examples •  Number of hidden layers •  Number of hidden units •  Learning rate •  Number of clusters •  Weight decay coefficient •  Convolution kernel width •  ...
  44. 44. Training Deep Learning Operating principle Network types Structure Behavior Weight Operation Neuron Layer Connection Hyper- parameter Activation function
  45. 45. Quality measure Training
  46. 46. Cost function Training Quality measure
  47. 47. Cost function (a.k.a. loss function) •  Determines distance from optimal performance •  Mean squared error as simple (and widespread) example
  48. 48. Cost function (a.k.a. loss function) •  Determines distance from optimal performance •  Mean squared error as simple (and widespread) example •  Often augmented with regularization term for better generalization (see challenges)
  49. 49. Optimization procedure Training Quality measure Cost function
  50. 50. Training Quality measure Stochastic gradient descent Cost function Optimization procedure
  51. 51. Stochastic gradient descent •  Direct calculation of minimum often not feasible •  Instead stepwise “descent” using the gradient à Gradient descent
  52. 52. Stochastic gradient descent •  Direct calculation of minimum often not feasible •  Instead stepwise “descent” using the gradient à Gradient descent •  Not feasible for large training sets •  Use (small) random sample of training set per iteration à Stochastic gradient descent (SGD)
  53. 53. Stochastic gradient descent Gradient Direction Steepness x
  54. 54. Stochastic gradient descent x ε * gradient x’ Learning rate ε
  55. 55. Training Quality measure Stochastic gradient descent Back- propagation Cost function Optimization procedure
  56. 56. Backpropagation •  Procedure to calculate new weights based on loss function Depends on cost function Depends on activation function Depends on input calculation
  57. 57. Backpropagation •  Procedure to calculate new weights based on loss function •  Usually “back-propagated” layer-wise •  Most widespread optimization procedure Depends on cost function Depends on activation function Depends on input calculation
  58. 58. Data Training Quality measure Stochastic gradient descent Back- propagation Cost function Optimization procedure
  59. 59. Training set Validation/ Test set Data Training Quality measure Stochastic gradient descent Back- propagation Cost function Optimization procedure
  60. 60. Data set •  Consists of examples (a.k.a. data points) •  Example always contains input tensor •  Sometimes also contains expected output tensor (depending on training type) •  Data set usually split up in several parts •  Training set – optimize accuracy (always used) •  Test set – test generalization (often used) •  Validation set – tune hyperparameters (sometimes used)
  61. 61. Data Types Training Quality measure Stochastic gradient descent Back- propagation Training set Validation/ Test set Cost function Optimization procedure
  62. 62. Supervised Data Types Training Quality measure Stochastic gradient descent Back- propagation Training set Validation/ Test set Cost function Optimization procedure
  63. 63. Supervised learning •  Typically learns from a large, yet finite set of examples •  Examples consist of input and output tensor •  Output tensor describes desired output •  Output tensor also called label or target •  Typical application areas •  Classification •  Regression and function prediction •  Structured output problems
  64. 64. Unsupervised Data Types Supervised Training Quality measure Stochastic gradient descent Back- propagation Training set Validation/ Test set Cost function Optimization procedure
  65. 65. Unsupervised learning •  Typically learns from a large, yet finite set of examples •  Examples consist of input tensor only •  Learning algorithm tries to learn useful properties of the data •  Requires different type of cost functions •  Typical application areas •  Clustering, density estimations •  Denoising, compression (dimension reduction) •  Synthesis and sampling
  66. 66. Reinforcement Data Types Supervised Training Quality measure Unsupervised Stochastic gradient descent Back- propagation Training set Validation/ Test set Cost function Optimization procedure
  67. 67. Reinforcement learning •  Continuously optimizes interaction with an environment based on reward-based learning Agent Environment State t Reward t State t+1 Reward t+1 Action t
  68. 68. Reinforcement learning •  Continuously optimizes interaction with an environment based on reward-based learning •  Goal is selection of action with highest expected reward •  Takes (discounted) expected future rewards into account •  Labeling of examples replaced by reward function •  Can continuously learn à data set can be infinite •  Typically used to solve complex tasks in (increasingly) complex environments with (very) limited feedback
  69. 69. Challenges Data Types Supervised Training Quality measure Unsupervised Reinforcement Stochastic gradient descent Back- propagation Training set Validation/ Test set Cost function Optimization procedure
  70. 70. Data Types Supervised Training Quality measure Unsupervised Reinforcement Stochastic gradient descent Back- propagation Under-/ Overfitting Training set Validation/ Test set Cost function Challenges Optimization procedure
  71. 71. Underfitting and Overfitting •  Training error describes how good training data is learnt •  Test error is an indicator for generalization capability •  Core challenge for all machine learning type algorithms 1.  Make training error small 2.  Make gap between training and test error small •  Underfitting is the violation of #1 •  Overfitting is the violation of #2
  72. 72. Good fit Underfitting Overfitting Training data Test data
  73. 73. Underfitting and Overfitting •  Under- and overfitting influenced by model capacity •  Too low capacity usually leads to underfitting •  Too high capacity usually leads to overfitting •  Finding the right capacity is a challenge
  74. 74. Data Types Supervised Training Quality measure Unsupervised Reinforcement Stochastic gradient descent Back- propagation Under-/ Overfitting Training set Validation/ Test set Cost function Regularization Challenges Optimization procedure
  75. 75. Regularization •  Regularization is a modification applied to learning algorithm •  to reduce the generalization error •  but not the training error •  Weight decay is a typical regularization measure
  76. 76. Data Types Supervised Training Quality measure Unsupervised Reinforcement Stochastic gradient descent Back- propagation Under-/ Overfitting Transfer learning Training set Validation/ Test set Cost function Regularization Challenges Optimization procedure
  77. 77. Transfer learning •  How to transfer insights between related tasks •  E.g., is it possible to transfer knowledge gained while training to recognize cars on the problem of recognizing trucks? •  General machine learning problem •  Subject of many research activities
  78. 78. Network types Deep Learning Operating principle Training Structure Behavior Weight Operation Neuron Data Types Challenges Quality measure Training set Cost function Transfer learning Regulari- zation Layer Connection Hyper- parameter Activation function Reinforce- ment Unsuper- vised Supervised Stochastic gradient descent Back- propagation Under-/ Overfitting Validation/ Test set Optimization procedure
  79. 79. MLP Multilayer Perceptron Network types
  80. 80. Multilayer perceptron (MLP) •  Multilayer feed-forward networks •  “Vanilla” neural networks •  Typically used for •  Function approximation •  Regression •  Classification Image source: https://deeplearning4j.org
  81. 81. CNN Convolutional Neural Network Network types MLP Multilayer Perceptron
  82. 82. Convolutional neural network (CNN) •  Special type of MLP for image processing •  Connects convolutional neuron only with receptive field •  Advantages •  Less computing power required •  Often even better recognition rates •  Inspired by organization of visual cortex Image source: https://deeplearning4j.org
  83. 83. RNN Recurrent Neural Network Network types MLP Multilayer Perceptron CNN Convolutional Neural Network
  84. 84. Recurrent neural network (RNN) •  Implements internal feedback loops •  Provides a temporal memory •  Typically used for •  Speech recognition •  Text recognition •  Time series processing Image source: https://deeplearning4j.org
  85. 85. LSTM Long Short-Term Memory Network types MLP Multilayer Perceptron CNN Convolutional Neural Network RNN Recurrent Neural Network
  86. 86. Long short-term memory (LSTM) •  Special type of RNN •  Uses special LSTM units •  Can implement very long-term memory while avoiding the vanishing/exploding gradient problem •  Same application areas as RNN Image source: https://deeplearning4j.org
  87. 87. Auto- encoder Network types MLP Multilayer Perceptron CNN Convolutional Neural Network RNN Recurrent Neural Network LSTM Long Short-Term Memory
  88. 88. Autoencoder •  Special type of MLP •  Reproduces input at output layer •  Consists of encoder and decoder •  Usually configured undercomplete •  Learns efficient feature codings •  Dimension reduction (incl. compression) •  Denoising •  Usually needs pre-training for not only reconstructing average of training set Image source: https://deeplearning4j.org
  89. 89. GAN Generative Adversarial Networks Network types MLP Multilayer Perceptron CNN Convolutional Neural Network RNN Recurrent Neural Network Auto- encoder LSTM Long Short-Term Memory
  90. 90. Generative adversarial networks (GAN) •  Consists of two (adversarial) networks •  Generator creating fake images •  Discriminator trying to identify fake images •  Typically used for •  Synthesis and sampling (e.g., textures in games) •  Structured output with variance (e.g., variations of a design or voice generation) •  Probably best known for creating fake celebrity images Image source: https://deeplearning4j.org
  91. 91. Deep Learning Operating principle Training Network types Structure Behavior Weight Operation Neuron Data CNN Types Challenges Quality measure RNN LSTM Auto- encoder GAN MLP Training set Cost function Transfer learning Regulari- zation Layer Connection Hyper- parameter Activation function Reinforce- ment Unsuper- vised Supervised Stochastic gradient descent Back- propagation Under-/ Overfitting Validation/ Test set Optimization procedure
  92. 92. How does Deep Learning feel in practice?
  93. 93. What issues might I face if diving deeper?
  94. 94. Issues you might face •  Very fast moving research domain •  You need the math. Really! •  How much data do you have? •  GDPR: Can you explain the decision of your network? •  Meta-Learning as the next step •  Monopolization of research and knowledge
  95. 95. Wrap-up
  96. 96. Wrap-up •  Broad, diverse topic •  Very good library support and more •  Very active research topic •  No free lunch •  You need the math! à Exciting and important topic – become a part of it!
  97. 97. References •  I. Goodfellow, Y. Bengio, A. Courville, ”Deep learning", MIT press, 2016, also available via https://www.deeplearningbook.org •  C. Perez, “The Deep Learning AI Playbook”, Intuition Machine Inc., 2017 •  F. Chollet, "Deep Learning with Python", Manning Publications, 2017 •  OpenAI, https://openai.com •  Keras, https://keras.io •  Deep Learning for Java, https://deeplearning4j.org/index.html •  Deep Learning (Resource site), http://deeplearning.net
  98. 98. @ShirinGlander Dr. Shirin Glander shirin.glander@codecentric.de @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de
  • TaechangJee1

    Dec. 19, 2019
  • MischaSoujon

    Dec. 1, 2019
  • BiancaZadrozny

    Aug. 30, 2019
  • jackkim22

    Jul. 28, 2019
  • mitsuakikubo

    Jul. 19, 2019
  • abuwipp

    Mar. 15, 2019
  • JulianRuppel1

    Jan. 27, 2019
  • EricMwiti1

    Oct. 2, 2018
  • elegoubin

    Aug. 1, 2018
  • jimmiethesun

    Apr. 30, 2018
  • FabianSchwab1

    Apr. 30, 2018
  • alexsbresler

    Apr. 28, 2018

This is a slide deck from a presentation, that my colleague Shirin Glander (https://www.slideshare.net/ShirinGlander/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :) For the sake of simplicity and completeness, I just copied the two slide decks together. As I did the "surrounding" part, I added Shirin's part at the place when she took over and then added my concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;) The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts. The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand. After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start. The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it. As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.

Views

Total views

2,645

On Slideshare

0

From embeds

0

Number of embeds

25

Actions

Downloads

105

Shares

0

Comments

0

Likes

12

×