SlideShare une entreprise Scribd logo
1  sur  53
Recurrent Networks and Beyond
Tomas Mikolov, Facebook
Neu-IR Workshop, Pisa, Italy 2016
Goals of this talk
• Explain recent success of recurrent networks
• Understand better the concept of (longer) short term memory
• Explore limitations of recurrent networks
• Discuss what needs to be done to build machines that can
understand language
Tomas Mikolov, Facebook, 2016
Brief History of Recurrent Nets – 80’s & 90’s
• Recurrent network architectures were very popular in the 80’s and
early 90’s (Elman, Jordan, Mozer, Hopfield, Parallel Distributed
Processing group, …)
• The main idea is very attractive: to re-use parameters and
computation (usually over time)
Tomas Mikolov, Facebook, 2016
Simple RNN Architecture
• Input layer, hidden layer with recurrent
connections, and the output layer
• In theory, the hidden layer can learn
to represent unlimited memory
• Also called Elman network
(Finding structure in time, Elman 1990)
Tomas Mikolov, Facebook, 2016
Brief History of Recurrent Nets – 90’s - 2010
• After the initial excitement, recurrent nets vanished from the
mainstream research
• Despite being theoretically powerful models, RNNs were mostly
considered as unstable to be trained
• Some success was achieved at IDSIA with the Long Short Term
Memory RNN architecture, but this model was too complex for others
to reproduce easily
Tomas Mikolov, Facebook, 2016
Brief History of Recurrent Nets – 2010 - today
• In 2010, it was shown that RNNs can significantly improve state-of-
the-art in language modeling, machine translation, data compression
and speech recognition (including strong commercial speech
recognizer from IBM)
• RNNLM toolkit was published to allow researchers to reproduce the
results and extend the techniques
• The key novel trick in RNNLM was trivial: to clip gradients to prevent
instability of training
Tomas Mikolov, Facebook, 2016
Brief History of Recurrent Nets – 2010 - today
• 21% - 24% reduction of WER on Wall Street Journal setup
Tomas Mikolov, Facebook, 2016
Brief History of Recurrent Nets – 2010 - today
• Improvement from RNNLM over n-gram increases with more data!
Tomas Mikolov, Facebook, 2016
Brief History of Recurrent Nets – 2010 - today
• Breakthrough result in 2011: 11% WER reduction over large system from IBM
• Ensemble of big RNNLM models trained on a lot of data
Tomas Mikolov, Facebook, 2016
Brief History of Recurrent Nets – 2010 - today
• RNNs became much more accessible through open-source
implementations in general ML toolkits:
• Theano
• Torch
• PyBrain
• TensorFlow
• …
Tomas Mikolov, Facebook, 2016
Recurrent Nets Today
• Widely applied:
• ASR (both acoustic and language models)
• MT (language & translation & alignment models, joint models)
• Many NLP applications
• Video modeling, handwriting recognition, user intent prediction, …
• Downside: for many problems RNNs are too powerful, models are
becoming unnecessarily complex
• Often, complicated RNN architectures are preferred because of wrong
reasons (easier to get a paper published and attract attention)
Tomas Mikolov, Facebook, 2016
Longer short term memory in simple RNNs
• How to add longer memory to RNNs without unnecessary complexity
• Paper: Learning Longer Memory in Recurrent Neural Networks
(Mikolov, Joulin, Chopra, Mathieu, Ranzato, ICLR Workshop 2015)
Tomas Mikolov, Facebook, 2016
Recurrent Network – Elman Architecture
• Also known as Simple Recurrent Network (SRN)
• Input layer 𝑥 𝑡, hidden layer ℎ 𝑡, output 𝑦𝑡
• Weight matrices 𝐴, 𝑅, 𝑈
Tomas Mikolov, Facebook, 2016
Recurrent Network – Elman Architecture
• Input layer 𝑥 𝑡, hidden layer ℎ 𝑡, output 𝑦𝑡
• Weight matrices 𝐴, 𝑅, 𝑈
ℎ 𝑡 = σ(𝐴𝑥 𝑡 + 𝑅ℎ 𝑡−1)
σ(𝑥) =
1
1 + 𝑒 𝑥
𝑦𝑡 = 𝑓(𝑈ℎ 𝑡)
𝑓() is softmax function
Tomas Mikolov, Facebook, 2016
Simple Recurrent Net Problems
• Backpropagation through time algorithm + stochastic gradient
descent is commonly used for training (Rumelhart et al, 1985)
• Gradients can either vanish or explode (Hochreiter 1991;
Bengio 1994)
Tomas Mikolov, Facebook, 2016
Simple Recurrent Net: Exploding Gradients
• The gradients explode rarely, but this can have disastrous effects
• Simple “hack” is to clip gradients to stay within some range
• This prevents exponential growth (which would later lead to giant
step in weight update)
• One can also normalize the gradients, or discard the weight updates
that are too big
Tomas Mikolov, Facebook, 2016
Simple Recurrent Net: Vanishing Gradients
• Most of the time, the gradients quickly vanish (after 5-10 steps of
backpropagation through time)
• This may not be a problem of SGD, but of the architecture of the SRN
Tomas Mikolov, Facebook, 2016
Simple Recurrent Net: Vanishing Gradients
• What recurrent architecture would be easier to train to capture
longer term patterns?
• Instead of fully connected recurrent matrix, we can use architecture
where each neuron is connected only to the input and to itself
• Old idea (Jordan 1987; Mozer 1989)
Tomas Mikolov, Facebook, 2016
Combination of both ideas: Elman + Mozer
• Part of the hidden layer is fully connected,
part is diagonal (self-connections)
• Can be seen as RNN with two
hidden layers
• Or as RNN with partially diagonal
recurrent matrix (+ linear hidden units)
Tomas Mikolov, Facebook, 2016
Combination of both ideas: Elman + Mozer
• The 𝛼 value can be learned, or kept
fixed close to 1 (we used 0.95)
• The 𝑃 matrix is optional
(usually helps a bit)
Tomas Mikolov, Facebook, 2016
Structurally Constrained Recurrent Net
• Because we constrain the architecture of SRN, we further denote the
model as Structurally Constrained Recurrent Net (SCRN)
• Alternative name is “slow recurrent nets”, as the state of the diagonal
layer changes slowly
Q: Wouldn’t it be enough to initialize the recurrent matrix to be diagonal?
A: No. This would degrade back to normal RNN and not learn longer memory.
Tomas Mikolov, Facebook, 2016
Results
• Language modeling experiments: Penn Treebank, Text8
• Longer memory in language models is commonly called cache / topic
• Comparison to Long Short Term Memory RNNs (currently popular but
quite complicated architecture that can learn longer term patterns)
• Datasets & code: http://github.com/facebook/SCRNNs
(link is in the paper)
Tomas Mikolov, Facebook, 2016
Results: Penn Treebank language modeling
• Gain from SCRN / LSTM over simpler recurrent net is similar to gain from cache
• LSTM has 3 gates for each hidden unit, and thus 4x more parameters need to be
accessed during training for the given hidden layer size (=> slower to train)
• SCRN with 100 fully connected and 40 self-connected neurons is only slightly
more expensive to train than SRN
Tomas Mikolov, Facebook, 2016
MODEL # hidden units Perplexity
N-gram - 141
N-gram + cache - 125
SRN 100 129
LSTM 100 (x4 parameters) 115
SCRN 100 + 40 115
Results: Text8
• Text8: Wikipedia text (~17M words), much stronger effect from cache
• Big gain for both SCRN & LSTM over SRN
• For small models, SCRN seems to be superior (simpler architecture, better
accuracy, faster training – less parameters)
Tomas Mikolov, Facebook, 2016
MODEL # hidden units Perplexity
N-gram - 309
N-gram + cache - 229
SRN 100 245
LSTM 100 (x4 parameters) 193
SCRN 100 + 80 184
Results: Text8
• With 500 hidden units, LSTM is slightly better in perplexity (3%) than SCRN, but it
also has many more parameters
Tomas Mikolov, Facebook, 2016
MODEL # hidden units Perplexity
N-gram - 309
N-gram + cache - 229
SRN 500 184
LSTM 500 (x4 parameters) 156
SCRN 500 + 80 161
Discussion of Results
• SCRN accumulates longer history in the “slow” hidden layer: the same
as exponentially decaying cache model
• Empirically, LSTM performance correlates strongly with cache
(weighted bag-of-words)
• For very large (~infinite) training sets, SCRN seems to be the
preferable architecture: it is computationally very cheap
Tomas Mikolov, Facebook, 2016
Conclusion
• Simple tricks can overcome the vanishing and exploding gradient
problems
• State of the recurrent layer can represent longer short term memory,
but not the long term one (across millions of time steps)
• To represent true long term memory, we may need to develop models
with ability to grow in size (modify their own structure)
Tomas Mikolov, Facebook, 2016
Beyond Deep Learning
• Going beyond: what RNNs and deep networks cannot model
efficiently?
• Surprisingly simple patterns! For example, memorization of
variable-length sequence of symbols
Tomas Mikolov, Facebook, 2016
Beyond Deep Learning: Algorithmic Patterns
• Many complex patterns have short, finite description length in natural
language (or in any Turing-complete computational system)
• We call such patterns Algorithmic patterns
• Examples of algorithmic patterns: 𝑎 𝑛 𝑏 𝑛, sequence memorization,
addition of numbers learned from examples
• These patterns often cannot be learned with standard deep learning
techniques
Tomas Mikolov, Facebook, 2016
Beyond Deep Learning: Algorithmic Patterns
• Among the myriad of complex tasks that are currently not solvable,
which ones should we focus on?
• We need to set ambitious end goal, and define a roadmap how to
achieve it step-by-step
Tomas Mikolov, Facebook, 2016
A Roadmap towards
Machine Intelligence
Tomas Mikolov, Armand Joulin and Marco Baroni
Ultimate Goal for Communication-based AI
Can do almost anything:
• Machine that helps students to understand homeworks
• Help researchers to find relevant information
• Write programs
• Help scientists in tasks that are currently too demanding (would
require hundreds of years of work to solve)
Tomas Mikolov, Facebook, 2016
The Roadmap
• We describe a minimal set of components we think the intelligent
machine will consist of
• Then, an approach to construct the machine
• And the requirements for the machine to be scalable
Tomas Mikolov, Facebook, 2016
Components of Intelligent machines
• Ability to communicate
• Motivation component
• Learning skills (further requires long-term memory), ie. ability to
modify itself to adapt to new problems
Tomas Mikolov, Facebook, 2016
Components of Framework
To build and develop intelligent machines, we need:
• An environment that can teach the machine basic communication skills and
learning strategies
• Communication channels
• Rewards
• Incremental structure
Tomas Mikolov, Facebook, 2016
The need for new tasks: simulated
environment
• There is no existing dataset known to us that would allow to teach the
machine communication skills
• Careful design of the tasks, including how quickly the complexity is
growing, seems essential for success:
• If we add complexity too quickly, even correctly implemented intelligent
machine can fail to learn
• By adding complexity too slowly, we may miss the final goals
Tomas Mikolov, Facebook, 2016
High-level description of the environment
Simulated environment:
• Learner
• Teacher
• Rewards
Scaling up:
• More complex tasks, less examples, less supervision
• Communication with real humans
• Real input signals (internet)
Tomas Mikolov, Facebook, 2016
Simulated environment - agents
• Environment: simple script-based reactive agent that produces signals
for the learner, represents the world
• Learner: the intelligent machine which receives input signal, reward
signal and produces output signal to maximize average incoming
reward
• Teacher: specifies tasks for Learner, first based on scripts, later to be
replaced by human users
Tomas Mikolov, Facebook, 2016
Simulated environment - communication
• Both Teacher and Environment write to Learner’s input channel
• Learner’s output channel influences its behavior in the Environment,
and can be used for communication with the Teacher
• Rewards are also part of the IO channels
Tomas Mikolov, Facebook, 2016
Visualization for better understanding
• Example of input / output streams and visualization:
Tomas Mikolov, Facebook, 2016
How to scale up: fast learners
• It is essential to develop fast learner: we can easily build a machine
today that will “solve” simple tasks in the simulated world using a
myriad of trials, but this will not scale to complex problems
• In general, showing the Learner new type of behavior and guiding it
through few tasks should be enough for it to generalize to similar
tasks later
• There should be less and less need for direct supervision through
rewards
Tomas Mikolov, Facebook, 2016
How to scale up: adding humans
• Learner capable of fast learning can start communicating with human
experts (us) who will teach it novel behavior
• Later, a pre-trained Learner with basic communication skills can be
used by human non-experts
Tomas Mikolov, Facebook, 2016
How to scale up: adding real world
• Learner can gain access to internet through its IO channels
• This can be done by teaching the Learner how to form a query in its
output stream
Tomas Mikolov, Facebook, 2016
The need for new techniques
Certain trivial patterns are nowadays hard to learn:
• 𝑎 𝑛 𝑏 𝑛 context free language is out-of-scope of standard RNNs
• Sequence memorization breaks LSTM RNNs
• We show this in a recent paper Inferring Algorithmic Patterns with
Stack-Augmented Recurrent Nets
Tomas Mikolov, Facebook, 2016
Scalability
To hope the machine can scale to more complex problems, we need:
• Long-term memory
• (Turing-) Complete and efficient computational model
• Incremental, compositional learning
• Fast learning from small number of examples
• Decreasing amount of supervision through rewards
• Further discussed in: A Roadmap towards Machine Intelligence
http://arxiv.org/abs/1511.08130
Tomas Mikolov, Facebook, 2016
Some steps forward: Stack RNNs (Joulin &
Mikolov, 2015)
• Simple RNN extended with a long term memory module that the
neural net learns to control
• The idea itself is very old (from 80’s – 90’s)
• Our version is very simple and learns patterns with complexity far
exceeding what was shown before (though still very toyish): much
less supervision, scales to more complex tasks
Tomas Mikolov, Facebook, 2016
• Learns algorithms from examples
• Add structured memory to RNN:
• Trainable [read/write]
• Unbounded
• Actions: PUSH / POP / NO-OP
• Examples of memory structures:
stacks, lists, queues, tapes, grids,
…
Stack RNN
Tomas Mikolov, Facebook, 2016
Algorithmic Patterns
• Examples of simple algorithmic patterns generated by short programs
(grammars)
• The goal is to learn these patterns unsupervisedly just by observing the
example sequences
Tomas Mikolov, Facebook, 2016
Algorithmic Patterns - Counting
• Performance on simple counting tasks
• RNN with sigmoidal activation function cannot count
• Stack-RNN and LSTM can count
Tomas Mikolov, Facebook, 2016
Algorithmic Patterns - Sequences
• Sequence memorization and binary addition are out-of-scope of
LSTM
• Expandable memory of stacks allows to learn the solution
Tomas Mikolov, Facebook, 2016
Binary Addition
• No supervision in training, just prediction
• Learns to: store digits, when to produce output, carry
Tomas Mikolov, Facebook, 2016
Stack RNNs: summary
The good:
• Turing-complete model of computation (with >=2 stacks)
• Learns some algorithmic patterns
• Has long term memory
• Simple model that works for some problems that break RNNs and LSTMs
• Reproducible: https://github.com/facebook/Stack-RNN
The bad:
• The long term memory is used only to store partial computation (ie. learned skills are not
stored there yet)
• Does not seem to be a good model for incremental learning
• Stacks do not seem to be a very general choice for the topology of the memory
Tomas Mikolov, Facebook, 2016
Conclusion
To achieve true artificial intelligence, we need:
• AI-complete goal
• New set of tasks
• Develop new techniques
• Motivate more people to address these problems
Tomas Mikolov, Facebook, 2016

Contenu connexe

Tendances

Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text MiningWill Stanton
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 

Tendances (20)

Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 

En vedette

Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening noteBhaskar Mitra
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
A Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web DataA Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web DataBhaskar Mitra
 
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Bhaskar Mitra
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Bhaskar Mitra
 
Neu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the TrenchesNeu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the TrenchesBhaskar Mitra
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
WSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdm
WSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdmWSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdm
WSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdmYahoo!デベロッパーネットワーク
 
Tutorial on query auto-completion
Tutorial on query auto-completionTutorial on query auto-completion
Tutorial on query auto-completionYichen Feng
 
Interleaving - SIGIR 2016 presentation
Interleaving - SIGIR 2016 presentationInterleaving - SIGIR 2016 presentation
Interleaving - SIGIR 2016 presentationXin QIAN
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksViet Ha-Thuc
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep LearningAdam Gibson
 
Introduction to word embeddings with Python
Introduction to word embeddings with PythonIntroduction to word embeddings with Python
Introduction to word embeddings with PythonPavel Kalaidin
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 

En vedette (16)

Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening note
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
A Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web DataA Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web Data
 
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
 
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
 
Neu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the TrenchesNeu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the Trenches
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
WSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdm
WSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdmWSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdm
WSDM2016報告会−論文紹介(Beyond Ranking:Optimizing Whole-Page Presentation)#yjwsdm
 
Tutorial on query auto-completion
Tutorial on query auto-completionTutorial on query auto-completion
Tutorial on query auto-completion
 
Interleaving - SIGIR 2016 presentation
Interleaving - SIGIR 2016 presentationInterleaving - SIGIR 2016 presentation
Interleaving - SIGIR 2016 presentation
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional Networks
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
Introduction to word embeddings with Python
Introduction to word embeddings with PythonIntroduction to word embeddings with Python
Introduction to word embeddings with Python
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 

Similaire à Recurrent Networks and Beyond: Understanding RNN Limitations and Exploring New Directions

Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40Jessica Willis
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides Sheamus McGovern
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machinesNAVER D2
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnnkartikaursang53
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...linshanleearchive
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Matrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdfMatrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdfAndrea496281
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Implement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket AnalysisImplement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket AnalysisKv Sagar
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
 

Similaire à Recurrent Networks and Beyond: Understanding RNN Limitations and Exploring New Directions (20)

Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
Image captioning
Image captioningImage captioning
Image captioning
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machines
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Matrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdfMatrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdf
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Implement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket AnalysisImplement LST perform LSTm stock Makrket Analysis
Implement LST perform LSTm stock Makrket Analysis
 
lec01.pptx
lec01.pptxlec01.pptx
lec01.pptx
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 

Plus de Bhaskar Mitra

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?Bhaskar Mitra
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural NetworksBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 

Plus de Bhaskar Mitra (20)

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 

Dernier

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Dernier (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Recurrent Networks and Beyond: Understanding RNN Limitations and Exploring New Directions

  • 1. Recurrent Networks and Beyond Tomas Mikolov, Facebook Neu-IR Workshop, Pisa, Italy 2016
  • 2. Goals of this talk • Explain recent success of recurrent networks • Understand better the concept of (longer) short term memory • Explore limitations of recurrent networks • Discuss what needs to be done to build machines that can understand language Tomas Mikolov, Facebook, 2016
  • 3. Brief History of Recurrent Nets – 80’s & 90’s • Recurrent network architectures were very popular in the 80’s and early 90’s (Elman, Jordan, Mozer, Hopfield, Parallel Distributed Processing group, …) • The main idea is very attractive: to re-use parameters and computation (usually over time) Tomas Mikolov, Facebook, 2016
  • 4. Simple RNN Architecture • Input layer, hidden layer with recurrent connections, and the output layer • In theory, the hidden layer can learn to represent unlimited memory • Also called Elman network (Finding structure in time, Elman 1990) Tomas Mikolov, Facebook, 2016
  • 5. Brief History of Recurrent Nets – 90’s - 2010 • After the initial excitement, recurrent nets vanished from the mainstream research • Despite being theoretically powerful models, RNNs were mostly considered as unstable to be trained • Some success was achieved at IDSIA with the Long Short Term Memory RNN architecture, but this model was too complex for others to reproduce easily Tomas Mikolov, Facebook, 2016
  • 6. Brief History of Recurrent Nets – 2010 - today • In 2010, it was shown that RNNs can significantly improve state-of- the-art in language modeling, machine translation, data compression and speech recognition (including strong commercial speech recognizer from IBM) • RNNLM toolkit was published to allow researchers to reproduce the results and extend the techniques • The key novel trick in RNNLM was trivial: to clip gradients to prevent instability of training Tomas Mikolov, Facebook, 2016
  • 7. Brief History of Recurrent Nets – 2010 - today • 21% - 24% reduction of WER on Wall Street Journal setup Tomas Mikolov, Facebook, 2016
  • 8. Brief History of Recurrent Nets – 2010 - today • Improvement from RNNLM over n-gram increases with more data! Tomas Mikolov, Facebook, 2016
  • 9. Brief History of Recurrent Nets – 2010 - today • Breakthrough result in 2011: 11% WER reduction over large system from IBM • Ensemble of big RNNLM models trained on a lot of data Tomas Mikolov, Facebook, 2016
  • 10. Brief History of Recurrent Nets – 2010 - today • RNNs became much more accessible through open-source implementations in general ML toolkits: • Theano • Torch • PyBrain • TensorFlow • … Tomas Mikolov, Facebook, 2016
  • 11. Recurrent Nets Today • Widely applied: • ASR (both acoustic and language models) • MT (language & translation & alignment models, joint models) • Many NLP applications • Video modeling, handwriting recognition, user intent prediction, … • Downside: for many problems RNNs are too powerful, models are becoming unnecessarily complex • Often, complicated RNN architectures are preferred because of wrong reasons (easier to get a paper published and attract attention) Tomas Mikolov, Facebook, 2016
  • 12. Longer short term memory in simple RNNs • How to add longer memory to RNNs without unnecessary complexity • Paper: Learning Longer Memory in Recurrent Neural Networks (Mikolov, Joulin, Chopra, Mathieu, Ranzato, ICLR Workshop 2015) Tomas Mikolov, Facebook, 2016
  • 13. Recurrent Network – Elman Architecture • Also known as Simple Recurrent Network (SRN) • Input layer 𝑥 𝑡, hidden layer ℎ 𝑡, output 𝑦𝑡 • Weight matrices 𝐴, 𝑅, 𝑈 Tomas Mikolov, Facebook, 2016
  • 14. Recurrent Network – Elman Architecture • Input layer 𝑥 𝑡, hidden layer ℎ 𝑡, output 𝑦𝑡 • Weight matrices 𝐴, 𝑅, 𝑈 ℎ 𝑡 = σ(𝐴𝑥 𝑡 + 𝑅ℎ 𝑡−1) σ(𝑥) = 1 1 + 𝑒 𝑥 𝑦𝑡 = 𝑓(𝑈ℎ 𝑡) 𝑓() is softmax function Tomas Mikolov, Facebook, 2016
  • 15. Simple Recurrent Net Problems • Backpropagation through time algorithm + stochastic gradient descent is commonly used for training (Rumelhart et al, 1985) • Gradients can either vanish or explode (Hochreiter 1991; Bengio 1994) Tomas Mikolov, Facebook, 2016
  • 16. Simple Recurrent Net: Exploding Gradients • The gradients explode rarely, but this can have disastrous effects • Simple “hack” is to clip gradients to stay within some range • This prevents exponential growth (which would later lead to giant step in weight update) • One can also normalize the gradients, or discard the weight updates that are too big Tomas Mikolov, Facebook, 2016
  • 17. Simple Recurrent Net: Vanishing Gradients • Most of the time, the gradients quickly vanish (after 5-10 steps of backpropagation through time) • This may not be a problem of SGD, but of the architecture of the SRN Tomas Mikolov, Facebook, 2016
  • 18. Simple Recurrent Net: Vanishing Gradients • What recurrent architecture would be easier to train to capture longer term patterns? • Instead of fully connected recurrent matrix, we can use architecture where each neuron is connected only to the input and to itself • Old idea (Jordan 1987; Mozer 1989) Tomas Mikolov, Facebook, 2016
  • 19. Combination of both ideas: Elman + Mozer • Part of the hidden layer is fully connected, part is diagonal (self-connections) • Can be seen as RNN with two hidden layers • Or as RNN with partially diagonal recurrent matrix (+ linear hidden units) Tomas Mikolov, Facebook, 2016
  • 20. Combination of both ideas: Elman + Mozer • The 𝛼 value can be learned, or kept fixed close to 1 (we used 0.95) • The 𝑃 matrix is optional (usually helps a bit) Tomas Mikolov, Facebook, 2016
  • 21. Structurally Constrained Recurrent Net • Because we constrain the architecture of SRN, we further denote the model as Structurally Constrained Recurrent Net (SCRN) • Alternative name is “slow recurrent nets”, as the state of the diagonal layer changes slowly Q: Wouldn’t it be enough to initialize the recurrent matrix to be diagonal? A: No. This would degrade back to normal RNN and not learn longer memory. Tomas Mikolov, Facebook, 2016
  • 22. Results • Language modeling experiments: Penn Treebank, Text8 • Longer memory in language models is commonly called cache / topic • Comparison to Long Short Term Memory RNNs (currently popular but quite complicated architecture that can learn longer term patterns) • Datasets & code: http://github.com/facebook/SCRNNs (link is in the paper) Tomas Mikolov, Facebook, 2016
  • 23. Results: Penn Treebank language modeling • Gain from SCRN / LSTM over simpler recurrent net is similar to gain from cache • LSTM has 3 gates for each hidden unit, and thus 4x more parameters need to be accessed during training for the given hidden layer size (=> slower to train) • SCRN with 100 fully connected and 40 self-connected neurons is only slightly more expensive to train than SRN Tomas Mikolov, Facebook, 2016 MODEL # hidden units Perplexity N-gram - 141 N-gram + cache - 125 SRN 100 129 LSTM 100 (x4 parameters) 115 SCRN 100 + 40 115
  • 24. Results: Text8 • Text8: Wikipedia text (~17M words), much stronger effect from cache • Big gain for both SCRN & LSTM over SRN • For small models, SCRN seems to be superior (simpler architecture, better accuracy, faster training – less parameters) Tomas Mikolov, Facebook, 2016 MODEL # hidden units Perplexity N-gram - 309 N-gram + cache - 229 SRN 100 245 LSTM 100 (x4 parameters) 193 SCRN 100 + 80 184
  • 25. Results: Text8 • With 500 hidden units, LSTM is slightly better in perplexity (3%) than SCRN, but it also has many more parameters Tomas Mikolov, Facebook, 2016 MODEL # hidden units Perplexity N-gram - 309 N-gram + cache - 229 SRN 500 184 LSTM 500 (x4 parameters) 156 SCRN 500 + 80 161
  • 26. Discussion of Results • SCRN accumulates longer history in the “slow” hidden layer: the same as exponentially decaying cache model • Empirically, LSTM performance correlates strongly with cache (weighted bag-of-words) • For very large (~infinite) training sets, SCRN seems to be the preferable architecture: it is computationally very cheap Tomas Mikolov, Facebook, 2016
  • 27. Conclusion • Simple tricks can overcome the vanishing and exploding gradient problems • State of the recurrent layer can represent longer short term memory, but not the long term one (across millions of time steps) • To represent true long term memory, we may need to develop models with ability to grow in size (modify their own structure) Tomas Mikolov, Facebook, 2016
  • 28. Beyond Deep Learning • Going beyond: what RNNs and deep networks cannot model efficiently? • Surprisingly simple patterns! For example, memorization of variable-length sequence of symbols Tomas Mikolov, Facebook, 2016
  • 29. Beyond Deep Learning: Algorithmic Patterns • Many complex patterns have short, finite description length in natural language (or in any Turing-complete computational system) • We call such patterns Algorithmic patterns • Examples of algorithmic patterns: 𝑎 𝑛 𝑏 𝑛, sequence memorization, addition of numbers learned from examples • These patterns often cannot be learned with standard deep learning techniques Tomas Mikolov, Facebook, 2016
  • 30. Beyond Deep Learning: Algorithmic Patterns • Among the myriad of complex tasks that are currently not solvable, which ones should we focus on? • We need to set ambitious end goal, and define a roadmap how to achieve it step-by-step Tomas Mikolov, Facebook, 2016
  • 31. A Roadmap towards Machine Intelligence Tomas Mikolov, Armand Joulin and Marco Baroni
  • 32. Ultimate Goal for Communication-based AI Can do almost anything: • Machine that helps students to understand homeworks • Help researchers to find relevant information • Write programs • Help scientists in tasks that are currently too demanding (would require hundreds of years of work to solve) Tomas Mikolov, Facebook, 2016
  • 33. The Roadmap • We describe a minimal set of components we think the intelligent machine will consist of • Then, an approach to construct the machine • And the requirements for the machine to be scalable Tomas Mikolov, Facebook, 2016
  • 34. Components of Intelligent machines • Ability to communicate • Motivation component • Learning skills (further requires long-term memory), ie. ability to modify itself to adapt to new problems Tomas Mikolov, Facebook, 2016
  • 35. Components of Framework To build and develop intelligent machines, we need: • An environment that can teach the machine basic communication skills and learning strategies • Communication channels • Rewards • Incremental structure Tomas Mikolov, Facebook, 2016
  • 36. The need for new tasks: simulated environment • There is no existing dataset known to us that would allow to teach the machine communication skills • Careful design of the tasks, including how quickly the complexity is growing, seems essential for success: • If we add complexity too quickly, even correctly implemented intelligent machine can fail to learn • By adding complexity too slowly, we may miss the final goals Tomas Mikolov, Facebook, 2016
  • 37. High-level description of the environment Simulated environment: • Learner • Teacher • Rewards Scaling up: • More complex tasks, less examples, less supervision • Communication with real humans • Real input signals (internet) Tomas Mikolov, Facebook, 2016
  • 38. Simulated environment - agents • Environment: simple script-based reactive agent that produces signals for the learner, represents the world • Learner: the intelligent machine which receives input signal, reward signal and produces output signal to maximize average incoming reward • Teacher: specifies tasks for Learner, first based on scripts, later to be replaced by human users Tomas Mikolov, Facebook, 2016
  • 39. Simulated environment - communication • Both Teacher and Environment write to Learner’s input channel • Learner’s output channel influences its behavior in the Environment, and can be used for communication with the Teacher • Rewards are also part of the IO channels Tomas Mikolov, Facebook, 2016
  • 40. Visualization for better understanding • Example of input / output streams and visualization: Tomas Mikolov, Facebook, 2016
  • 41. How to scale up: fast learners • It is essential to develop fast learner: we can easily build a machine today that will “solve” simple tasks in the simulated world using a myriad of trials, but this will not scale to complex problems • In general, showing the Learner new type of behavior and guiding it through few tasks should be enough for it to generalize to similar tasks later • There should be less and less need for direct supervision through rewards Tomas Mikolov, Facebook, 2016
  • 42. How to scale up: adding humans • Learner capable of fast learning can start communicating with human experts (us) who will teach it novel behavior • Later, a pre-trained Learner with basic communication skills can be used by human non-experts Tomas Mikolov, Facebook, 2016
  • 43. How to scale up: adding real world • Learner can gain access to internet through its IO channels • This can be done by teaching the Learner how to form a query in its output stream Tomas Mikolov, Facebook, 2016
  • 44. The need for new techniques Certain trivial patterns are nowadays hard to learn: • 𝑎 𝑛 𝑏 𝑛 context free language is out-of-scope of standard RNNs • Sequence memorization breaks LSTM RNNs • We show this in a recent paper Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets Tomas Mikolov, Facebook, 2016
  • 45. Scalability To hope the machine can scale to more complex problems, we need: • Long-term memory • (Turing-) Complete and efficient computational model • Incremental, compositional learning • Fast learning from small number of examples • Decreasing amount of supervision through rewards • Further discussed in: A Roadmap towards Machine Intelligence http://arxiv.org/abs/1511.08130 Tomas Mikolov, Facebook, 2016
  • 46. Some steps forward: Stack RNNs (Joulin & Mikolov, 2015) • Simple RNN extended with a long term memory module that the neural net learns to control • The idea itself is very old (from 80’s – 90’s) • Our version is very simple and learns patterns with complexity far exceeding what was shown before (though still very toyish): much less supervision, scales to more complex tasks Tomas Mikolov, Facebook, 2016
  • 47. • Learns algorithms from examples • Add structured memory to RNN: • Trainable [read/write] • Unbounded • Actions: PUSH / POP / NO-OP • Examples of memory structures: stacks, lists, queues, tapes, grids, … Stack RNN Tomas Mikolov, Facebook, 2016
  • 48. Algorithmic Patterns • Examples of simple algorithmic patterns generated by short programs (grammars) • The goal is to learn these patterns unsupervisedly just by observing the example sequences Tomas Mikolov, Facebook, 2016
  • 49. Algorithmic Patterns - Counting • Performance on simple counting tasks • RNN with sigmoidal activation function cannot count • Stack-RNN and LSTM can count Tomas Mikolov, Facebook, 2016
  • 50. Algorithmic Patterns - Sequences • Sequence memorization and binary addition are out-of-scope of LSTM • Expandable memory of stacks allows to learn the solution Tomas Mikolov, Facebook, 2016
  • 51. Binary Addition • No supervision in training, just prediction • Learns to: store digits, when to produce output, carry Tomas Mikolov, Facebook, 2016
  • 52. Stack RNNs: summary The good: • Turing-complete model of computation (with >=2 stacks) • Learns some algorithmic patterns • Has long term memory • Simple model that works for some problems that break RNNs and LSTMs • Reproducible: https://github.com/facebook/Stack-RNN The bad: • The long term memory is used only to store partial computation (ie. learned skills are not stored there yet) • Does not seem to be a good model for incremental learning • Stacks do not seem to be a very general choice for the topology of the memory Tomas Mikolov, Facebook, 2016
  • 53. Conclusion To achieve true artificial intelligence, we need: • AI-complete goal • New set of tasks • Develop new techniques • Motivate more people to address these problems Tomas Mikolov, Facebook, 2016