2. Outlines for Today
Section I. Basics of Convolutional Neural Networks
◦ What is CNN?
◦ Comparison with traditional Neural Networks
◦ Why we need CNNs?
◦ Boosting Technologies for CNNs
Section II. More Details of Convolutional Neural Networks
◦ AlexNet A Network for classification The “Equation”
◦ Optimization Methods in Neural Networks The Numerical “Solver”
Section III. Convolutional Neural Networks with Tensorflow and TFlearn
3. Section I. The Basics
Image from http://parse.ele.tue.nl/cluster/2/CNNArchitecture.jpg
4. What is Convolutional Neural Network?
What is convolution?
◦ It is a specialized linear operation.
◦ A 2D convolution is shown on the right. (Images From: community.arm.com)
◦ Strictly speaking, it’s cross-correlation.
◦ In CNNs, all convolution operations are actually cross-correlation.
Convolutional neural networks are neural networks that use convolution in place of general
matrix multiplication in at least one of their layers. They are very powerful in processing data
with grid-like topology. [1]
[1] Ian Goodfellow, Yoshua Bengio, Aaron Courville , Deep Learning
5. Comparison with MLP
In last lecture, we got to know MLP(multi-layer perceptron), where the operation from one
layer to neurons in the upper layer is matrix multiplication controlled by weights and bias.
In CNNs, where do those “Neurons” go?
◦ Each neuron is one element in the matrix after convolution
◦ weights are shared
6. Comparison with MLP
Local Connections
A
B
C
A, with convolution kernel size = 3, the activated neurons are only affected by local neurons , unlike in B,
where there are full connections; however, with depth, the receptive field can expand, and get global connections
to neurons in lower layer.
7. Why we Need Convolutional Neural
Networks?
A lot of challenges we could not deal with in the past, now with CNN, yes, we can! :D
A lot of things we could do in the past, now with CNN, we can do better!
CNNs represent current state-of-the-art technique in classification, object detection etc.
Now, let’s take a brief look at these achievements…
8. MNIST Hand-written digits recognition
The MNIST database of handwritten digits
◦ Has a training set of 60000 examples,
◦ Has a test set of 10000 examples,
◦ Is a subset of a larger set available from NIST ( National Institute of Standards and Technology)
◦ The digits have been size-normalized (28x28) and centered in a fixed-size image.
http://simonwinder.com/2015/07/training-neural-nets-on-mnist-digits/
9. MNIST Classification Record [1]
Classifier Preprocessing Best Test Error Rate (%)
Linear Classifiers deskewing 7.6
K-Nearest Neighbours Shape-context feature extraction 0.63
Boosted Stumps Haar features 0.87
Non-linear classifiers none 3.3
SVMs deskewing 0.56
Neural Nets none 0.35
Convolution Neural Nets Width normalization 0.23
[1] http://yann.lecun.com/exdb/mnist/
10. The ImageNet Challenge [1][2]
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a benchmark in object
category classification and detection on hundreds of object categories and millions of images
◦ The ILSVRC challenge has been running annually since 2010, following the footsteps of PASCAL VOC
challenge, which was established in 2005.
◦ ILSVRC 2010, 1,461406 images and 1000 object classes.
◦ Images are annotated, and annotations fall into one of two categories
◦ (1) image-level annotation of a binary label for the presence or absence of an object class in the image;
◦ (2) object-level annotation of a tight bounding box and class label around an object instance in the image.
◦ ILSVRC 2017, the last ILSVRC challenge.
◦ In these years, several convolutional neural network structure won the first place:
◦ AlexNet 2012
◦ InceptionNet 2014
◦ Deep Residual Network 2015
[1] http://image-net.org/challenges/LSVRC/2017/
[2] Olga Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge
15. Technology Behind PRISMA [1]
Deep Convolutional Neural Networks
(a) Separate the content and style of an image
(b) Recombine the content of one image with
the style of another image
[1] Leon A. Gatys et al, A Neural Algorithm of Artistic Style
16. Boosting Technology for CNNs
The First CNN prototype appeared much earlier, but why it becomes super-hot only in the recent
years?
◦ Huge amount of data and advanced storage/memory systems
◦ GPU acceleration which is super fast in convolution operations (Nvidia GPU Tesla K40 1.4 TFlops)
◦ Deep neural network structures
◦ Optimization methods for training the deep CNNs are invented, like stochastic gradient descent
◦ Off-the-shelf software package solutions are available and easy to use
◦ Progress in both hardware and software make CNNs the ONE!
17. Section II: More Details [1]
http://www.ritchieng.com/machine-learning/deep-learning/convs/
[1] Slides in section II, credit from slides presented by Tugce Tasci and Kyunghee Kim
26. Overlapping Pooling
Pooling summarize the outputs of neighbouring groups of neurons in the same kernel map.
Two important parameters
◦ Kernel size : z
◦ Stride size: s
◦ If s < z, then the max-pooling is overlapped
In the experiment, s=2, z=3 overlapped pooling reduces the top-1 and top-5 error rates by 0.4%
and 0.3%, respectively, compared with s=2 and z=2 non-overlapping case.
31. Train the CNNs Optimization
Techniques
Back-propagation
◦ Sparse Connections of CNNs decrease the complexity of Back-Propagation
◦ ReLU activation function relieves the vanishing gradient problem
Stochastic Gradient Descent
48. Section III. CNNs with Tensorflow and
TFlearn
Images from Peter Goldsborough, A Tour of Tensorflow
49. Tensorflow
Tensorflow is an open-source library for numerical computation using data flow graphs
◦ Developed by Google Brain Team and Google’s Machine Intelligence research Org.
Implementation ML in tensorflow
◦ In tensorflow, computations are represented using Graphs
◦ Each node is an operation (OP)
◦ Data is represented as Tensors
◦ OP takes Tensors and returns Tensors
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
50. Construction of Computational Graph
Follow the 3-steps pattern
◦ 1. inference() – Builds the graph as far as is required for running the network forward to make
predictions
◦ 2. loss() – Adds to the inference graph the ops required to generate loss
◦ 3. training() – Adds to the loss graph the ops required to compute and apply gradients
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
52. Load the training data, using MNIST
from tensorflow.examples.tutorials.mnist import input_data
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
62. TFLearn
TFLearn is an abstraction library built on top of Tensorflow that provides high-level building
blocks to quickly construct TensorFlow graphs.
◦ Highly modular interface
◦ Allow rapid chaining of neural network layers, regularization functions, optimizers and other elements
◦ Can be used with tensorflow hybridly
In the following part, let’s implement the previous CNN model with tflearn, and see how much
easier life is now!
TFLearn Website http://tflearn.org/
63. Redo the same thing with TFLearn
Import the packages
TFLearn Website http://tflearn.org/
67. Conclusion
Pros:
◦ Deep Convolutional Neural Networks represent current state-of-the-art techniques in image
classification, object detection and localization
◦ Powerful CNN models are like AlexNet, InceptionNet, Deep Residual Networks
◦ Open-source libraries for deploying applications with CNNs very fast
◦ Convolutional Neural Networks can share pre-trained weights, which is the basis for transfer learning
Cons:
◦ The interpretation and mechanism of CNNs are not clear, we don’t know why they work better than
previous models
◦ Large number of training data and annotations are needed, which may not be practical in some
problems.