SlideShare a Scribd company logo
1 of 65
Download to read offline
Deep Learning Cases:
Text and Image Processing
Grigory Sapunov
Founders & Developers: Deep Learning Unicorns
Moscow 03.04.2016
gs@inten.to
“Simple” Image & Video Processing
Simple tasks: Classification and Detection
http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
Detection task is harder than classification, but both are almost done.
And with better-than-human quality.
Case #1: IJCNN 2011
The German Traffic Sign Recognition Benchmark
● Classification, >40 classes
● >50,000 real-life images
● First Superhuman Visual Pattern Recognition
○ 2x better than humans
○ 3x better than the closest artificial competitor
○ 6x better than the best non-neural method
Method Correct (Error)
1 Committee of CNNs 99.46 % (0.54%)
2 Human Performance 98.84 % (1.16%)
3 Multi-Scale CNNs 98.31 % (1.69%)
4 Random Forests 96.14 % (3.86%)
http://people.idsia.ch/~juergen/superhumanpatternrecognition.html
Case #2: ILSVRC 2010-2015
Large Scale Visual Recognition Challenge (ILSVRC)
● Object detection (200 categories, ~0.5M images)
● Classification + localization (1000 categories, 1.2M images)
Case #2: ILSVRC 2010-2015
● Blue: Traditional CV
● Purple: Deep Learning
● Red: Human
Examples: Object Detection
Example: Face Detection + Emotion Classification
Example: Face Detection + Classification + Regression
Examples: Food Recognition
Examples: Computer Vision on the Road
Examples: Pedestrian Detection
Examples: Activity Recognition
Examples: Road Sign Recognition (on mobile!)
● NVidia Jetson TK1/TX1
○ 192/256 CUDA Cores
○ 64-bit Quad-Core ARM A15/A57 CPU, 2/4 Gb Mem
● Raspberry Pi 3
○ 1.2 GHz 64-bit quad-core ARM Cortex-A53, 1 Gb SDRAM, US$35
● Tablets, Smartphones
● Google Project Tango
Deep Learning goes mobile!
...even more mobile
http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/
This drone can automatically follow forest
trails to track down lost hikers
...even homemade automobile
Meet the 26-Year-Old Hacker Who Built a Self-
Driving Car... in His Garage
https://www.youtube.com/watch?v=KTrgRYa2wbI
More complex Image & Video
Processing
https://www.youtube.com/watch?v=ZJMtDRbqH40
NYU Semantic Segmentation with a Convolutional Network (33 categories)
Semantic Segmentation
Caption Generation
http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
Example: NeuralTalk and Walk
Ingredients:
● https://github.com/karpathy/neuraltalk2
Project for learning Multimodal Recurrent Neural Networks that describe
images with sentences
● Webcam/notebook
Result:
● https://vimeo.com/146492001
More hacking: NeuralTalk and Walk
Product of the near future: DenseCap and ?
http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Image Colorization
http://richzhang.github.io/colorization/
Visual Question Answering
https://avisingh599.github.io/deeplearning/visual-qa/
Reinforcement Learning
Управление симулированным автомобилем на основе видеосигнала (2013)
http://people.idsia.ch/~juergen/gecco2013torcs.pdf
http://people.idsia.ch/~juergen/compressednetworksearch.html
Reinforcement Learning
Reinforcement Learning
Human-level control through deep reinforcement learning (2014)
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html
Playing Atari with Deep Reinforcement Learning (2013)
http://arxiv.org/abs/1312.5602
Reinforcement Learning
Fun: Deep Dream
http://blogs.wsj.com/digits/2016/02/29/googles-computers-paint-like-van-gogh-and-the-art-sells-for-thousands/
More Fun: Neural Style
http://www.dailymail.co.uk/sciencetech/article-3214634/The-algorithm-learn-copy-artist-Neural-network-recreate-snaps-style-Van-Gogh-Picasso.html
More Fun: Neural Style
http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
More Fun: Photo-realistic Synthesis
http://arxiv.org/abs/1601.04589 Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis
More Fun: Neural Doodle
http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks
(a) Original painting by Renoir, (b) semantic annotations,
(c) desired layout, (d) generated output.
Text Processing / NLP
Deep Learning and NLP
Variety of tasks:
● Finding synonyms
● Fact extraction: people and company names, geography, prices, dates,
product names, …
● Classification: genre and topic detection, positive/negative sentiment
analysis, authorship detection, …
● Machine translation
● Search (written and spoken)
● Question answering
● Dialog systems
● Language modeling, Part of speech recognition
https://code.google.com/archive/p/word2vec/
Example: Semantic Spaces (word2vec, GloVe)
http://nlp.stanford.edu/projects/glove/
Example: Semantic Spaces (word2vec, GloVe)
Encoding semantics
Using word2vec instead of word indexes allows you to better deal with the word
meanings (e.g. no need to enumerate all synonyms because their vectors are
already close to each other).
But the naive way to work with word2vec vectors still gives you a “bag of words”
model, where phrases “The man killed the tiger” and “The tiger killed the man” are
equal.
Need models which pay attention to the word ordering: paragraph2vec, sentence
embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
Multi-modal learning
http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Example: More multi-modal learning
Case: Sentiment analysis
http://nlp.stanford.edu/sentiment/
Can capture complex cases where bag-of-words models fail.
“This movie was actually neither that funny, nor super witty.”
Case: Machine Translation
Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
Case: Automated Speech Translation
Translating voice calls and video calls in 7 languages and instant messages in over 50.
https://www.skype.com/en/features/skype-translator/
Case: Baidu Automated Speech Recognition (ASR)
More Fun: MtG cards
http://www.escapistmagazine.com/articles/view/scienceandtech/14276-Magic-The-Gathering-Cards-Made-by-Artificial-Intelligence
Case: Question Answering
A Neural Network for Factoid Question Answering over Paragraphs, https://cs.umd.edu/~miyyer/qblearn/
Case: Dialogue Systems
A Neural Conversational Model,
Oriol Vinyals, Quoc Le
http://arxiv.org/abs/1506.05869
What for: Conversational Commerce
https://medium.com/chris-messina/2016-will-be-the-year-of-conversational-commerce-1586e85e3991
What for: Conversational Commerce
Summary
Why Deep Learning is helpful? Or even a game-changer
● Works on raw data (pixels, sound, text or chars), no need to feature
engineering
○ Some features are really hard to develop (requires years of work for
group of experts)
○ Some features are patented (i.e. SIFT, SURF for images)
● Allows end-to-end learning (pixels-to-category, sound to sentence, English
sentence to Chinese sentence, etc)
○ No need to do segmentation, etc. (a lot of manual labor)
⇒ You can iterate faster (and get superior quality at the same time!)
Still some issues exist
● No dataset -- no deep learning
There are a lot of data available (and it’s required for deep learning,
otherwise simple models could be better)
○ But sometimes you have no dataset…
■ Nonetheless some hacks available: Transfer learning, Data
augmentation, Mechanical Turk, …
● Requires a lot of computations.
No cluster or GPU machines -- much more time required
So what to do next?
Universal Libraries and Frameworks
● Torch7 (http://torch.ch/)
● TensorFlow (https://www.tensorflow.org/)
● Theano (http://deeplearning.net/software/theano/)
○ Keras (http://keras.io/)
○ Lasagne (https://github.com/Lasagne/Lasagne)
○ blocks (https://github.com/mila-udem/blocks)
○ pylearn2 (https://github.com/lisa-lab/pylearn2)
● CNTK (http://www.cntk.ai/)
● Neon (http://neon.nervanasys.com/)
● Deeplearning4j (http://deeplearning4j.org/)
● Google Prediction API (https://cloud.google.com/prediction/)
● …
● http://deeplearning.net/software_links/
Libraries & Frameworks for image/video processing
● OpenCV (http://opencv.org/)
● Caffe (http://caffe.berkeleyvision.org/)
● Torch7 (http://torch.ch/)
● clarifai (http://clarif.ai/)
● Google Vision API (https://cloud.google.com/vision/)
● …
● + all universal libraries
Libraries & Frameworks for speech
● CNTK (http://www.cntk.ai/)
● KALDI (http://kaldi-asr.org/)
● Google Speech API (https://cloud.google.com/)
● Yandex SpeechKit (https://tech.yandex.ru/speechkit/)
● Baidu Speech API (http://www.baidu.com/)
● wit.ai (https://wit.ai/)
● …
Libraries & Frameworks for text processing
● Torch7 (http://torch.ch/)
● Theano/Keras/…
● TensorFlow (https://www.tensorflow.org/)
● MetaMind (https://www.metamind.io/)
● Google Translate API (https://cloud.google.com/translate/)
● …
● + all universal libraries
What to read and where to study?
- CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei
Li, Andrej Karpathy, Stanford (http://vision.stanford.
edu/teaching/cs231n/index.html)
- CS224d: Deep Learning for Natural Language Processing, Richard
Socher, Stanford (http://cs224d.stanford.edu/index.html)
- Neural Networks for Machine Learning, Geoffrey Hinton (https://www.
coursera.org/course/neuralnets)
- Computer Vision course collection
(http://eclass.cc/courselists/111_computer_vision_and_navigation)
- Deep learning course collection
(http://eclass.cc/courselists/117_deep_learning)
- Book “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville
(http://www.deeplearningbook.org/)
What to read and where to study?
- Google+ Deep Learning community (https://plus.google.
com/communities/112866381580457264725)
- VK Deep Learning community (http://vk.com/deeplearning)
- Quora (https://www.quora.com/topic/Deep-Learning)
- FB Deep Learning Moscow (https://www.facebook.
com/groups/1505369016451458/)
- Twitter Deep Learning Hub (https://twitter.com/DeepLearningHub)
- NVidia blog (https://devblogs.nvidia.com/parallelforall/tag/deep-learning/)
- IEEE Spectrum blog (http://spectrum.ieee.org/blog/cars-that-think)
- http://deeplearning.net/
- Arxiv Sanity Preserver http://www.arxiv-sanity.com/
- ...
Whom to follow?
- Jürgen Schmidhuber (http://people.idsia.ch/~juergen/)
- Geoffrey E. Hinton (http://www.cs.toronto.edu/~hinton/)
- Google DeepMind (http://deepmind.com/)
- Yann LeCun (http://yann.lecun.com, https://www.facebook.com/yann.lecun)
- Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy, https://www.quora.
com/profile/Yoshua-Bengio)
- Andrej Karpathy (http://karpathy.github.io/)
- Andrew Ng (http://www.andrewng.org/)
- ...
https://ru.linkedin.com/in/grigorysapunov
gs@inten.to
Thanks!

More Related Content

What's hot

Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with RPoo Kuan Hoong
 
Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Alexander Korbonits
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachMaurizio Calo Caligaris
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
Deep Learning in the Wild with Arno Candel
Deep Learning in the Wild with Arno CandelDeep Learning in the Wild with Arno Candel
Deep Learning in the Wild with Arno CandelSri Ambati
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep LearningAsim Jalis
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事台灣資料科學年會
 

What's hot (20)

Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Deep Learning in the Wild with Arno Candel
Deep Learning in the Wild with Arno CandelDeep Learning in the Wild with Arno Candel
Deep Learning in the Wild with Arno Candel
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
TensorFlow
TensorFlowTensorFlow
TensorFlow
 
孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事
 

Similar to Deep Learning Cases: Text and Image Processing

Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Grigory Sapunov
 
Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Aravindharamanan S
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkDoug Chang
 
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao XieSCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao XieTao Xie
 
Designing nlp-js-extension
Designing nlp-js-extensionDesigning nlp-js-extension
Designing nlp-js-extensionAlain Lompo
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail SelectionKamil Sindi
 
Distributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlowDistributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlowJan Wiegelmann
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with KerasQuantUniversity
 
building intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningbuilding intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningmustafa sarac
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...Edge AI and Vision Alliance
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!Bruno Capuano
 
Human age and gender Detection
Human age and gender DetectionHuman age and gender Detection
Human age and gender DetectionAbhiAchalla
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .netMarco Parenzan
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Transfer Leaning Using Pytorch  synopsis Minor project pptxTransfer Leaning Using Pytorch  synopsis Minor project pptx
Transfer Leaning Using Pytorch synopsis Minor project pptxAnkit Gupta
 

Similar to Deep Learning Cases: Text and Image Processing (20)

Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018
 
Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
 
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao XieSCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
 
Designing nlp-js-extension
Designing nlp-js-extensionDesigning nlp-js-extension
Designing nlp-js-extension
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software Engineering
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail Selection
 
Distributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlowDistributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlow
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
building intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningbuilding intelligent systems with large scale deep learning
building intelligent systems with large scale deep learning
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
 
Human age and gender Detection
Human age and gender DetectionHuman age and gender Detection
Human age and gender Detection
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Transfer Leaning Using Pytorch  synopsis Minor project pptxTransfer Leaning Using Pytorch  synopsis Minor project pptx
Transfer Leaning Using Pytorch synopsis Minor project pptx
 

More from Grigory Sapunov

AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021Grigory Sapunov
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)Grigory Sapunov
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Grigory Sapunov
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionGrigory Sapunov
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)Grigory Sapunov
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Grigory Sapunov
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep LearningGrigory Sapunov
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучениеGrigory Sapunov
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureGrigory Sapunov
 
Computer Vision and Deep Learning
Computer Vision and Deep LearningComputer Vision and Deep Learning
Computer Vision and Deep LearningGrigory Sapunov
 

More from Grigory Sapunov (20)

Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
NLP in 2020
NLP in 2020NLP in 2020
NLP in 2020
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
BERTology meets Biology
BERTology meets BiologyBERTology meets Biology
BERTology meets Biology
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 version
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep Learning
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучение
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and Future
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
Computer Vision and Deep Learning
Computer Vision and Deep LearningComputer Vision and Deep Learning
Computer Vision and Deep Learning
 
Apache Spark & MLlib
Apache Spark & MLlibApache Spark & MLlib
Apache Spark & MLlib
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Deep Learning Cases: Text and Image Processing

  • 1. Deep Learning Cases: Text and Image Processing Grigory Sapunov Founders & Developers: Deep Learning Unicorns Moscow 03.04.2016 gs@inten.to
  • 2. “Simple” Image & Video Processing
  • 3. Simple tasks: Classification and Detection http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf Detection task is harder than classification, but both are almost done. And with better-than-human quality.
  • 4. Case #1: IJCNN 2011 The German Traffic Sign Recognition Benchmark ● Classification, >40 classes ● >50,000 real-life images ● First Superhuman Visual Pattern Recognition ○ 2x better than humans ○ 3x better than the closest artificial competitor ○ 6x better than the best non-neural method Method Correct (Error) 1 Committee of CNNs 99.46 % (0.54%) 2 Human Performance 98.84 % (1.16%) 3 Multi-Scale CNNs 98.31 % (1.69%) 4 Random Forests 96.14 % (3.86%) http://people.idsia.ch/~juergen/superhumanpatternrecognition.html
  • 5. Case #2: ILSVRC 2010-2015 Large Scale Visual Recognition Challenge (ILSVRC) ● Object detection (200 categories, ~0.5M images) ● Classification + localization (1000 categories, 1.2M images)
  • 6. Case #2: ILSVRC 2010-2015 ● Blue: Traditional CV ● Purple: Deep Learning ● Red: Human
  • 8. Example: Face Detection + Emotion Classification
  • 9. Example: Face Detection + Classification + Regression
  • 14. Examples: Road Sign Recognition (on mobile!)
  • 15. ● NVidia Jetson TK1/TX1 ○ 192/256 CUDA Cores ○ 64-bit Quad-Core ARM A15/A57 CPU, 2/4 Gb Mem ● Raspberry Pi 3 ○ 1.2 GHz 64-bit quad-core ARM Cortex-A53, 1 Gb SDRAM, US$35 ● Tablets, Smartphones ● Google Project Tango Deep Learning goes mobile!
  • 16. ...even more mobile http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/ This drone can automatically follow forest trails to track down lost hikers
  • 17. ...even homemade automobile Meet the 26-Year-Old Hacker Who Built a Self- Driving Car... in His Garage https://www.youtube.com/watch?v=KTrgRYa2wbI
  • 18. More complex Image & Video Processing
  • 19. https://www.youtube.com/watch?v=ZJMtDRbqH40 NYU Semantic Segmentation with a Convolutional Network (33 categories) Semantic Segmentation
  • 20. Caption Generation http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
  • 21.
  • 22. Example: NeuralTalk and Walk Ingredients: ● https://github.com/karpathy/neuraltalk2 Project for learning Multimodal Recurrent Neural Networks that describe images with sentences ● Webcam/notebook Result: ● https://vimeo.com/146492001
  • 24. Product of the near future: DenseCap and ? http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
  • 27. Reinforcement Learning Управление симулированным автомобилем на основе видеосигнала (2013) http://people.idsia.ch/~juergen/gecco2013torcs.pdf http://people.idsia.ch/~juergen/compressednetworksearch.html
  • 29. Reinforcement Learning Human-level control through deep reinforcement learning (2014) http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html Playing Atari with Deep Reinforcement Learning (2013) http://arxiv.org/abs/1312.5602
  • 32.
  • 33. More Fun: Neural Style http://www.dailymail.co.uk/sciencetech/article-3214634/The-algorithm-learn-copy-artist-Neural-network-recreate-snaps-style-Van-Gogh-Picasso.html
  • 34. More Fun: Neural Style http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
  • 35. More Fun: Photo-realistic Synthesis http://arxiv.org/abs/1601.04589 Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis
  • 36. More Fun: Neural Doodle http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks (a) Original painting by Renoir, (b) semantic annotations, (c) desired layout, (d) generated output.
  • 38. Deep Learning and NLP Variety of tasks: ● Finding synonyms ● Fact extraction: people and company names, geography, prices, dates, product names, … ● Classification: genre and topic detection, positive/negative sentiment analysis, authorship detection, … ● Machine translation ● Search (written and spoken) ● Question answering ● Dialog systems ● Language modeling, Part of speech recognition
  • 41. Encoding semantics Using word2vec instead of word indexes allows you to better deal with the word meanings (e.g. no need to enumerate all synonyms because their vectors are already close to each other). But the naive way to work with word2vec vectors still gives you a “bag of words” model, where phrases “The man killed the tiger” and “The tiger killed the man” are equal. Need models which pay attention to the word ordering: paragraph2vec, sentence embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
  • 42. Multi-modal learning http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
  • 44.
  • 45. Case: Sentiment analysis http://nlp.stanford.edu/sentiment/ Can capture complex cases where bag-of-words models fail. “This movie was actually neither that funny, nor super witty.”
  • 46. Case: Machine Translation Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
  • 47. Case: Automated Speech Translation Translating voice calls and video calls in 7 languages and instant messages in over 50. https://www.skype.com/en/features/skype-translator/
  • 48. Case: Baidu Automated Speech Recognition (ASR)
  • 49. More Fun: MtG cards http://www.escapistmagazine.com/articles/view/scienceandtech/14276-Magic-The-Gathering-Cards-Made-by-Artificial-Intelligence
  • 50. Case: Question Answering A Neural Network for Factoid Question Answering over Paragraphs, https://cs.umd.edu/~miyyer/qblearn/
  • 51. Case: Dialogue Systems A Neural Conversational Model, Oriol Vinyals, Quoc Le http://arxiv.org/abs/1506.05869
  • 52. What for: Conversational Commerce https://medium.com/chris-messina/2016-will-be-the-year-of-conversational-commerce-1586e85e3991
  • 55. Why Deep Learning is helpful? Or even a game-changer ● Works on raw data (pixels, sound, text or chars), no need to feature engineering ○ Some features are really hard to develop (requires years of work for group of experts) ○ Some features are patented (i.e. SIFT, SURF for images) ● Allows end-to-end learning (pixels-to-category, sound to sentence, English sentence to Chinese sentence, etc) ○ No need to do segmentation, etc. (a lot of manual labor) ⇒ You can iterate faster (and get superior quality at the same time!)
  • 56. Still some issues exist ● No dataset -- no deep learning There are a lot of data available (and it’s required for deep learning, otherwise simple models could be better) ○ But sometimes you have no dataset… ■ Nonetheless some hacks available: Transfer learning, Data augmentation, Mechanical Turk, … ● Requires a lot of computations. No cluster or GPU machines -- much more time required
  • 57. So what to do next?
  • 58. Universal Libraries and Frameworks ● Torch7 (http://torch.ch/) ● TensorFlow (https://www.tensorflow.org/) ● Theano (http://deeplearning.net/software/theano/) ○ Keras (http://keras.io/) ○ Lasagne (https://github.com/Lasagne/Lasagne) ○ blocks (https://github.com/mila-udem/blocks) ○ pylearn2 (https://github.com/lisa-lab/pylearn2) ● CNTK (http://www.cntk.ai/) ● Neon (http://neon.nervanasys.com/) ● Deeplearning4j (http://deeplearning4j.org/) ● Google Prediction API (https://cloud.google.com/prediction/) ● … ● http://deeplearning.net/software_links/
  • 59. Libraries & Frameworks for image/video processing ● OpenCV (http://opencv.org/) ● Caffe (http://caffe.berkeleyvision.org/) ● Torch7 (http://torch.ch/) ● clarifai (http://clarif.ai/) ● Google Vision API (https://cloud.google.com/vision/) ● … ● + all universal libraries
  • 60. Libraries & Frameworks for speech ● CNTK (http://www.cntk.ai/) ● KALDI (http://kaldi-asr.org/) ● Google Speech API (https://cloud.google.com/) ● Yandex SpeechKit (https://tech.yandex.ru/speechkit/) ● Baidu Speech API (http://www.baidu.com/) ● wit.ai (https://wit.ai/) ● …
  • 61. Libraries & Frameworks for text processing ● Torch7 (http://torch.ch/) ● Theano/Keras/… ● TensorFlow (https://www.tensorflow.org/) ● MetaMind (https://www.metamind.io/) ● Google Translate API (https://cloud.google.com/translate/) ● … ● + all universal libraries
  • 62. What to read and where to study? - CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei Li, Andrej Karpathy, Stanford (http://vision.stanford. edu/teaching/cs231n/index.html) - CS224d: Deep Learning for Natural Language Processing, Richard Socher, Stanford (http://cs224d.stanford.edu/index.html) - Neural Networks for Machine Learning, Geoffrey Hinton (https://www. coursera.org/course/neuralnets) - Computer Vision course collection (http://eclass.cc/courselists/111_computer_vision_and_navigation) - Deep learning course collection (http://eclass.cc/courselists/117_deep_learning) - Book “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville (http://www.deeplearningbook.org/)
  • 63. What to read and where to study? - Google+ Deep Learning community (https://plus.google. com/communities/112866381580457264725) - VK Deep Learning community (http://vk.com/deeplearning) - Quora (https://www.quora.com/topic/Deep-Learning) - FB Deep Learning Moscow (https://www.facebook. com/groups/1505369016451458/) - Twitter Deep Learning Hub (https://twitter.com/DeepLearningHub) - NVidia blog (https://devblogs.nvidia.com/parallelforall/tag/deep-learning/) - IEEE Spectrum blog (http://spectrum.ieee.org/blog/cars-that-think) - http://deeplearning.net/ - Arxiv Sanity Preserver http://www.arxiv-sanity.com/ - ...
  • 64. Whom to follow? - Jürgen Schmidhuber (http://people.idsia.ch/~juergen/) - Geoffrey E. Hinton (http://www.cs.toronto.edu/~hinton/) - Google DeepMind (http://deepmind.com/) - Yann LeCun (http://yann.lecun.com, https://www.facebook.com/yann.lecun) - Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy, https://www.quora. com/profile/Yoshua-Bengio) - Andrej Karpathy (http://karpathy.github.io/) - Andrew Ng (http://www.andrewng.org/) - ...