Deep learning with Keras

Location:
ODSC 2017
5/4/2017
Deep Learning with Keras
2016 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com

2
Slides and Code will be available at:
http://www.analyticscertificate.com/ODSC2017

- Analytics Advisory services
- Custom training & certificate programs
- Fintech and Energy Analytics and Infrastructure

• Founder of QuantUniversity LLC. and
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
4

5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Launching the Analytics Certificate
Program in Summer and a Fintech
Certification program in Fall

6
• May 2017
▫ Sponsoring the CFA Fintech Conference in Boston
▫ QuantUniversity Chicago Meetup
 Deep Learning – May 18th - https://www.meetup.com/QuantUniversity-
Meetup-Chicago
▫ Deep Learning Workshop – May 30,31st
 Chicago & Online : http://www.analyticscertificate.com/DeepLearning
• June 2017
▫ Machine Learning Workshop – June 8th, 9th
 New York & Online : http://www.analyticscertificate.com/MachineLearning
▫ Anomaly Detection Workshop – June 18th, 19th
 Boston & Online : http://www.analyticscertificate.com/Anomaly
Events of Interest

Summer 2017: http://www.analyticscertificate.com

8
• Boston
• New York
• Chicago
• Washington DC (Coming soon)
QuantUniversity meetups

10
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering, etc.
▫ Create a transformed representation of the original data=> PCA
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1

11
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y

12
Start with labeled pairs (Xi, Yi)
( ,“kitten”),( ,“puppy”)
…

13
Success: predict new examples
( ,?)

14
https://commons.wikimedia.org/wiki/Neural_network
“kitten”
“puppy”
“has fur?”
“pointy ears?”
“dangerously cute?”

16
http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double

17
Weighted sum

18
Non-linear “activation” function

19
Learning = “find good weights”

20
Learning = “find good weights”
How? Gradient descent!

22
1. Our labeled datasets were thousands of times too small.
2. Our computers were millions of times too slow.
3. We initialized the weights in a stupid way.
4. We used the wrong type of non-linearity.
- Geoff Hinton
Neural nets were tried in the 1980s. What changed?
https://youtu.be/IcOMKXAw5VA?t=21m29s

23
http://www.rsipvision.com/exploring-deep-learning/

24
http://www.asimovinstitute.org/neural-network-zoo/

26
https://research.googleblog.com/2014/09/building-deeper-understanding-of-images.html

27
https://research.googleblog.com/2014/09/building-deeper-understanding-of-images.html

28
Computer-Aided Diagnosis with Deep Learning Architecture: Applications to Breast Lesions in US Images and Pulmonary
Nodules in CT Scans http://www.nature.com/articles/srep24454/figures/1

29
Towards End-to-End Speech Recognition with Recurrent Neural Networks
http://www.jmlr.org/proceedings/papers/v32/graves14.pdf

30
https://www.technologyreview.com/s/544651/baidus-deep-learning-system-rivals-people-at-speech-recognition/

31
https://research.googleblog.com/2014/11/a-picture-is-worth-thousand-coherent.html

32
http://cs.umd.edu/~miyyer/data/deepqa.pdf

34
http://blog.ventureradar.com/2016/03/11/10-hot-startups-using-artificial-intelligence-in-cyber-security/

35
https://www.youtube.com/watch?v=H4V6NZLNu-c

36
https://www.engadget.com/2016/03/12/watch-alphago-vs-lee-sedol-round-3-live-right-now/

37
https://www.youtube.com/watch?v=kMMbW96nMW8

39
How is deep learning special?
Given (lots of) data, DNNs learn useful input
representations.
D. Erhan et al. ‘09
http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/247

42
Data
http://www.theneweconomy.com/strategy/big-data-is-not-without-its-problems

43
New Approaches
http://deeplearning.net/reading-list/

45
• Theano is a Python library that allows you to define, optimize, and
evaluate mathematical expressions involving multi-dimensional
arrays efficiently
• Performs efficient symbolic differentiation
• Leverages NVIDIA GPU (Claim 140X faster than CPU)
• Developed by University of Montreal researchers and is open-source
• Works on Windows/Linux/Mac OS
• See https://arxiv.org/abs/1605.02688
Theano

46
• GPU vs CPU
▫ Theano Test
▫ See Theano Test.ipyb
Demo

47
• Logistic Regression
Theano
See Theano-Logistic Regression.ipyb

49
Convolutional Neural Networks
Convolution

51
Convolutional Neural Networks
See Theano-Conv-Net.ipynb

52
• Keras is a high-level neural networks library, written in Python and
capable of running on top of either TensorFlow or Theano. It was
developed with a focus on enabling fast experimentation.
• Allows for easy and fast prototyping (through total modularity,
minimalism, and extensibility).
• Supports both convolutional networks and recurrent networks, as
well as combinations of the two.
• Supports arbitrary connectivity schemes (including multi-input and
multi-output training).
• Runs seamlessly on CPU and GPU.
Keras

53
• Keras Examples
▫ Testing Keras: See KerasPython.ipynb
▫ Mlp-1 layer
▫ Running Convolutional NN on Keras with a Theano Backend
 See Keras-conv-example-mnist.ipynb
Demo

55
• Motivation1:
Autoencoders
1. http://ai.stanford.edu/~quocle/tutorial2.pdf

56
• Goal is to have ෤𝑥 to approximate x
• Interesting applications such as
▫ Data compression
▫ Visualization
▫ Pre-train neural networks
Autoencoder

57
Demo in Keras1
1. https://blog.keras.io/building-autoencoders-in-keras.html
2. https://keras.io/models/model/

Supervised learning
Cross-sectional
▫ Observations are independent
▫ Given X1----Xi, predict Y
▫ CNNs

Supervised learning
Sequential
▫ Sequentially ordered
▫ Given O1---OT, predict OT+1
1 Normal
2 Normal
3 Abnormal
4 Normal
5 Abnormal

60
• Given : X1,X2,X3----XN
• Convert the Univariate time series dataset to a cross sectional
Dataset
Time series modeling in Keras using MLPs
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X Y
X1 X2
X2 X3
X3 X4
X4 X5
X5 X6
X6 X7
X7 X8
X8 X9
X9 X10
X10 X11
X11 X12
X12 X13
X13 X14
X14 X15

61
• Monthly data
• Computational Intelligence in Forecasting
• Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download
Sample data
0
200
400
600
800
1000
1200
1400
1600
1800
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
106

62
• Use 72 for training and 36 for testing
• Lookback 1, 10
• Longer the lookback, larger the network
Multi-Layer Perceptron
Size 8
Size 1

63
Demo
Train Score: 1972.20 MSE (44.41 RMSE)
Test Score: 3001.77 MSE (54.79 RMSE)
Train Score: 2631.49 MSE (51.30 RMSE)
Test Score: 4166.64 MSE (64.55 RMSE)
Lookback = 1 Lookback = 10

64
• Has 3 types of parameters
▫ W – Hidden weights
▫ U – Hidden to Hidden weights
▫ V – Hidden to Label weights
• All W,U,V are shared
Recurrent Neural Networks1
1. http://ai.stanford.edu/~quocle/tutorial2.pdf

65
Where can Recurrent Neural Networks be used?1
1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image
classification).
2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive
or negative sentiment).
4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in
English and then outputs a sentence in French).
5. Synced sequence input and output (e.g. video classification where we wish to label each frame of
the video).

66
• Andrej Karpathy’s article
▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Hand writing generation demo
▫ http://www.cs.toronto.edu/~graves/handwriting.html
Sample applications

67
Recurrent Neural Networks
• A recurrent neural network can be thought of as multiple copies of
the same network, each passing a message to a successor. 1
• Backpropagation(computing gradient wrt all parameters of the
network) which is process used to propagate errors and weights
needs to be modified for RNNs due to the existence of loops
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

68
• BPTT begins by unfolding a recurrent neural network through time
as shown in the figure.
• Training then proceeds in a manner similar to training a feed-
forward neural network with backpropagation, except that the
training patterns are visited in sequential order.
Back Propagation through time (BPTT)1
1. https://en.wikipedia.org/wiki/Backpropagation_through_time

69
• Backpropagation through time (BPTT) for RNNs is difficult due to a
problem known as vanishing/exploding gradient . i.e, the gradient
becomes extremely small or large towards the first and end of the
network.
• This is addressed by LSTM RNNs. Instead of neurons, LSTMs use
memory cells 1
Addressing the problem of Vanishing/Exploding gradient
http://deeplearning.net/tutorial/lstm.html

70
• Dataset of 25,000 movies reviews from IMDB, labeled by sentiment
(positive/negative).
• Reviews have been preprocessed, and each review is encoded as a sequence of
word indexes (integers).
• For convenience, words are indexed by overall frequency in the dataset, so that
for instance the integer "3" encodes the 3rd most frequent word in the data.
• The 2011 paper (see below) had approximately 88% accuracy
• See
▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py
▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-
networks-python-keras/
▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf
Demo – IMDB Dataset

71
Network
The most frequent 5000 words are chosen and mapped to 32 length vector
Sequences are restricted to 500 words; > 500 cut off ; < 500 pad
LSTM layer with 100 output dimensions
Accuracy: 84.08%

73
• Neural Networks are resource intensive
▫ Typically require huge dedicated hardware (RAM, GPUs)
• Parameter space huge! – 100s of thousands of parameters
▫ Tuning is important
• Architecture choice is important:
▫ See http://www.asimovinstitute.org/neural-network-zoo/
Key takeaways from modeling Deep Neural Networks

What is Spark ?
• Apache Spark™ is a fast and general engine for large-scale data
processing.
• Run programs up to 100x faster than Hadoop MapReduce
in memory, or 10x faster on disk.
Lightning-fast cluster computing

Why Spark ?
Generality
• Combine SQL, streaming, and
complex analytics.
• Spark powers a stack of high-level
tools including:
1. Spark Streaming: processing real-time
data streams
2. Spark SQL and DataFrames: support
for structured data and relational
queries
3. MLlib: built-in machine learning library
4. GraphX: Spark’s new API for graph
processing

76
• Investment : Enterprises have significantly invested in Big-Data
infrastructure
• GPUs – Require specialized hardware – Niche Use-cases
• Can enterprises reuse existing infrastructure for deep learning
applications?
• What use-cases in Deep learning can leverage Apache Spark?
Deep Learning + Apache Spark ?

77
• Databricks – Platform for running Spark applications
• BigDL – Intel’s library for deep learning on existing data frameworks.
• TensorflowOnSpark – Yahoo’s Distributed Deep Learning on Big Data
Clusters
• The Rest:
▫ SparkNet – AMPLab’s framework for training deep networks in Spark
▫ DeepLearning4J – Uses Data parallism to train on separate neural
networks
▫ DeepDist - Lightning-Fast Deep Learning on Spark Via parallel stochastic
gradient updates
Efforts on using Deep Learning Frameworks with Spark

78
• Deploying trained models to make predictions on data stored in
Spark RDDs or Dataframes
 Inception model: https://www.tensorflow.org/tutorials/image_recognition
 Each prediction requires about 4.8 billion operations
 Parallelizing with Spark helps scale operations
Databricks
https://databricks.com/blog/2016/12/21/deep-learning-on-
databricks.html

79
• Distributed model training
 Use deep learning libraries like TensorFlow to test different model
hyperparameters on each worker
 Task parallelism
Databricks
https://databricks.com/blog/2016/12/21/deep-learning-on-
databricks.html

80
• Tensorframes
 Experimental TensorFlow binding for Scala and Apache Spark.
 TensorFrames (TensorFlow on Spark Dataframes) lets you manipulate
Apache Spark's DataFrames with TensorFlow programs.
 TensorFrames is available as a Spark package.
Databricks
https://github.com/databricks/tensorframes

81
• BigDL is an open source,
distributed deep learning
library for Apache Spark that
has feature parity with
existing popular deep learning
frameworks like Torch and
Caffe
• BigDL is a standalone Spark
package
Intel’s BigDL library
https://www.oreilly.com/ideas/deep-learning-for-apache-spark

82
• BigDL uses Intel Math Kernel Library, a fast math library for Intel and
compatible processors to facilitate multi-threaded programming in
each Spark task.
• The MKL library facilitates efficiently train larger models across a
cluster (using distributed synchronous, mini-batch SGD)
• Key Value proposition:
▫ “The typical deep learning pipeline that involves data preprocessing
and preparation on a Spark cluster and model training on a server with
multiple GPUs, now involves a simple Spark library that runs on the
same cluster used for data preparation and storage.”
Intel’s BigDL library
https://www.oreilly.com/ideas/deep-learning-for-apache-spark

83
• Existing DL frameworks often require setting up separate clusters for
deep learning, forcing us to create multiple programs for a machine
learning pipeline
TensorflowOnSpark,
CafeOnSpark – Yahoo’s Distributed Deep Learning
https://github.com/yahoo/TensorFlowOnSpark
http://yahoohadoop.tumblr.com/post/157196317141/open-sourcing-
tensorflowonspark-distributed-deep

84
• TensorFlowOnSpark supports all types of TensorFlow programs,
enabling both asynchronous and synchronous training and
inferencing. It supports model parallelism and data parallelism.
https://github.com/yahoo/TensorFlowOnSpark
http://yahoohadoop.tumblr.com/post/157196317141/open-sourcing-
tensorflowonspark-distributed-deep
TensorflowOnSpark,
CafeOnSpark – Yahoo’s Distributed Deep Learning

85
• Developed at UC Berleley’s AMPLab
• SparkNet is built on top of Spark and Caffe.
• Not much activity in the last year https://github.com/amplab/SparkNet
• SparkNet's parallelized stochastic gradient decent (SGD) algorithm requires
minimal communication between nodes
SparkNet
https://arxiv.org/pdf/1511.06051v1.pdf

86
• Deeplearning4j (DL4J) leverages Spark clusters for fast, distributed,
in-memory training of DL models that were developed Scala or Java
• A centralized DL model iteratively averages the parameters
produced by separate neural nets.
DeepLearning4J
https://deeplearning4j.org/spark.html#how

87
• Leverages Spark and asynchronous SGD to accelerate Deep Learning
training from HDFS/Spark data
• DeepDist fetches the model from the master and calls gradient().
After computing gradients on the data partitions, gradient updates
are sent back the server. On the server, the master model is updated
by descent() using the updates from the nodes..
DeepDist
http://deepdist.com/

88
• Databricks – Platform for running Spark applications
• BigDL – Intel’s library for deep learning on existing data frameworks.
• TensorflowOnSpark – Yahoo’s Distributed Deep Learning on Big Data
Clusters
• The Rest:
▫ SparkNet – AMPLab’s framework for training deep networks in Spark
▫ DeepLearning4J – Uses Data parallism to train on separate neural
networks
▫ DeepDist - Lightning-Fast Deep Learning on Spark Via parallel stochastic
gradient updates
Efforts on using Deep Learning Frameworks with Spark

89
• QuantUniversity has started a new initiative to support students and
unemployed professionals interested in fintech and data science
roles to attend our workshops for free/reduced cost.
• If you or some one you know are interested in attending our
workshops for free/significantly discounted price, apply for a
scholarship here
• If you are want to join us in supporting this initiative through a
sponsorship, please contact us. We are on a mission to democratize
Analytics education and we seek your support in making it possible!
QuantUniversity’s Analytics for a cause Initiative

Thank you!
Checkout our programs at:
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
91

Deep learning with Keras

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Deep learning with Keras

Similaire à Deep learning with Keras (20)

Plus de QuantUniversity

Plus de QuantUniversity (20)

Dernier

Dernier (20)

Deep learning with Keras