SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
1/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Memory Networks, Neural Turing Machines,
and Question Answering
Akram El-Korashy1
1Max Planck Institute for Informatics
November 30, 2015
Deep Learning Seminar.
Papers by Weston et al. (ICLR2015), Graves et al. (2014), and
Sukhbaatar et al. (2015)
2/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Outline
1 Introduction
Intuition and resemblance to human cognition
How does it look like?
2 QA Experiments, End-to-End
Architecture - MemN2N
Training
Baselines and Results
3 QA Experiments, Strongly Supervised
Architecture - MemNN
Training
Results
4 NTM code induction experiments
3/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Intuition and resemblance to human cognition
Why memory?
Human’s working memory is a capacity for short-term storage
of information and its rule-based manipulation. . .
Therefore, an NTM1resembles a working memory system, as it
is designed to solve tasks that require the application of
approximate rules to “rapidly-created variables”.
1
Neural Turing Machine. I will use it interchangeably with Memory
Networks, depending on which paper I am citing.
4/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Intuition and resemblance to human cognition
Why memory? Why not RNNs and LSTM?
The memory in these models is the state of the network, which
is latent (i.e., hidden; no exlpicit access) and inherently
unstable over long timescales. [Sukhbaatar2015]
Unlike a standard network, NTM interacts with a memory matrix
using selective read and write operations that can focus on
(almost) a single memory location. [Graves2014]
5/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Intuition and resemblance to human cognition
Why memory networks? How about attention models with RNN
encoders/decoders?
The memory model is indeed analogous to the attention
mechanisms introduced for machine translation.
5/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Intuition and resemblance to human cognition
Why memory networks? How about attention models with RNN
encoders/decoders?
The memory model is indeed analogous to the attention
mechanisms introduced for machine translation.
Main differences
In a memory network model, the query can be made over
multiple sentences, unlike machine translation.
The memory model makes several hops on the memory
before making an output.
The network architecture of the memory scoring is a
simple linear layer, as opposed to a sophisticated gated
architecture in previous work.
6/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Intuition and resemblance to human cognition
Why memory? What’s the main usage?
Memory as non-compact storage
Explicitly update memory slots mi on test time by making use of
a “generalization” component that determines “what” is to be
stored from input x, and “where” to store it (choosing among
the memory slots).
Storing stories for Question Answering
Given a story (i.e., a sequence of sentences), training of the
output component of the memory network can learn scoring
functions (i.e., similarity) between query sentences and existing
memory slots from previous sentences.
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
A memory model that is trained only end-to-end.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
Trained model takes a set of inputs x1, ..., xn to be stored in
the memory, a query q, and outputs an answer a.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
Each of xi, q, a contains symbols coming from a dictionary
with V words.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
All x is written to memory up to a fixed buffer size, then find
a continuous representation for the x and q.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
The continuous representation is then processed via
multiple hops to output a.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
This allows back-propagation of the error signal through
multiple memory accesses back to input during training.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
A, B, C are embedding matrices (of size d × V) used to
convert the input to the d-dimensional vectors mi.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
A match is computed between u and each memory mi by
taking the inner product followed by a softmax.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
The response vector o from the memory is the weighted
sum: o = i pici.
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
7/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
How does it look like?
Overview of a memory model
The final prediction (answer to the query) is computed with
the help of a weight matrix as: ˆa = Softmax(W(o + u)).
Figure: A single layer, and a three-layer memory model
[Sukhbaatar2015]
8/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Plan
1 Introduction
Intuition and resemblance to human cognition
How does it look like?
2 QA Experiments, End-to-End
Architecture - MemN2N
Training
Baselines and Results
3 QA Experiments, Strongly Supervised
Architecture - MemNN
Training
Results
4 NTM code induction experiments
9/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Synthetic QA tasks, supporting subset
There are a total of 20 different types of tasks that test
different forms of reasoning and deduction.
Figure: A given QA task consists of a set of statements, followed by a
question whose answer is typically a single word. [Sukhbaatar2015]
9/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Synthetic QA tasks, supporting subset
Note that for each question, only some subset of the
statements contain information needed for the answer, and
the others are essentially irrelevant distractors (e.g., the
first sentence in the first example).
Figure: A given QA task consists of a set of statements, followed by a
question whose answer is typically a single word. [Sukhbaatar2015]
9/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Synthetic QA tasks, supporting subset
In the Memory Networks of Weston et al., this supporting
subset was explicitly indicated to the model during training.
Figure: A given QA task consists of a set of statements, followed by a
question whose answer is typically a single word. [Sukhbaatar2015]
9/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Synthetic QA tasks, supporting subset
In what is called end-to-end training of memory networks,
this information is no longer provided.
Figure: A given QA task consists of a set of statements, followed by a
question whose answer is typically a single word. [Sukhbaatar2015]
9/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Synthetic QA tasks, supporting subset
20 QA tasks. A task is a set of example problems. A
problem is a set of I sentences xi where I ≤ 320, a
question q and an answer a.
Figure: A given QA task consists of a set of statements, followed by a
question whose answer is typically a single word. [Sukhbaatar2015]
9/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Synthetic QA tasks, supporting subset
The vocabulary is of size V = 177! Two versions of the
data are used, one that has 1000 training problems per
task, and one with 10,000 per task.
Figure: A given QA task consists of a set of statements, followed by a
question whose answer is typically a single word. [Sukhbaatar2015]
10/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Architecture - MemN2N
Model Architecture
K = 3 hops were used.
Adjacent weight sharing was used to ease training and reduce
the number of parameters.
Adjacent weight tying
1 The output embedding of a layer is input to the layer
above. (Ak+1 = Ck )
2 Answer prediction is the same as the final output.
(WT = CK )
3 Question embedding is the same as the input to the first
layer. (B = A1)
11/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Architecture - MemN2N
Sentence Representation, Temporal Encoding
Two different sentence representations: bag-of-words
(BoW), and Position Encoding (PE)
BoW embeds each words, and sums the resulting vectors,
e.g., mi = j Axij .
PE encodes the position of the word using a column vector
lj where lkj = (1 − j/J) − (k/d)(1 − 2j/J), where J is the
number of words in the sentence.
11/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Architecture - MemN2N
Sentence Representation, Temporal Encoding
Two different sentence representations: bag-of-words
(BoW), and Position Encoding (PE)
BoW embeds each words, and sums the resulting vectors,
e.g., mi = j Axij .
PE encodes the position of the word using a column vector
lj where lkj = (1 − j/J) − (k/d)(1 − 2j/J), where J is the
number of words in the sentence.
Temporal Encoding: Modify the memory vector with a
special matrix that encodes temporal information. 2
Now, mi = j Axij + TA(i), where TA(i) is the ith row of a
special temporal matrix TA.
All the T matrices are learned during training. They are
subject to the sharing constraints as between A and C.
2
There isn’t enough detail on what constraints this matrix should be
subject to, if any.
12/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Training
Loss function and learning parameters
Embedding Matrices A, B and C, as well as W are jointly
learnt.
Loss function is a standard cross entropy between ˆa and
the true label a.
Stochastic gradient descent is used with learning rate of
η = 0.01, with annealing.
13/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Training
Parameters and Techniques
RN: Learning time invariance by injecting random noise to
regularize TA
LS: Linear start: Remove all softmax except for the answer
prediction layer. Apply it back when validation loss stops
decreasing. (LS learning rate of η = 0.005 instead of 0.01
for normal training.)
LW: Layer-wise, RNN-like weight tying. Otherwise,
adjacent weight tying.
BoW or PE: sentence representation.
joint: training on all 20 tasks jointly vs independently.
[Sukhbaatar2015]
14/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Baselines and Results
RN: Learning time invariance by injecting random noise to
regularize TA
Figure: All variants of the end-to-end trained memory model
comfortably beat the weakly supervised baseline methods.
[Sukhbaatar2015]
14/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Baselines and Results
LS: Linear start: Remove all softmax except for the answer prediction layer. Apply it back when validation loss stops
decreasing. (LS learning rate of η = 0.005 instead of 0.01 for normal training.)
Figure: All variants of the end-to-end trained memory model
comfortably beat the weakly supervised baseline methods.
[Sukhbaatar2015]
14/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Baselines and Results
LW: Layer-wise, RNN-like weight tying. Otherwise, adjacent
weight tying.
Figure: All variants of the end-to-end trained memory model
comfortably beat the weakly supervised baseline methods.
[Sukhbaatar2015]
14/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Baselines and Results
BoW or PE: sentence representation.
Figure: All variants of the end-to-end trained memory model
comfortably beat the weakly supervised baseline methods.
[Sukhbaatar2015]
14/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Baselines and Results
take-home msg: More memory hops give improved
performance.
Figure: All variants of the end-to-end trained memory model
comfortably beat the weakly supervised baseline methods.
[Sukhbaatar2015]
14/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Baselines and Results
take-home msg: Joint training on various tasks sometimes
helps.
Figure: All variants of the end-to-end trained memory model
comfortably beat the weakly supervised baseline methods.
[Sukhbaatar2015]
15/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Baselines and Results
Set of Supporting Facts
Figure: Instances of successful prediction of the supporting
sentences.
16/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Plan
1 Introduction
Intuition and resemblance to human cognition
How does it look like?
2 QA Experiments, End-to-End
Architecture - MemN2N
Training
Baselines and Results
3 QA Experiments, Strongly Supervised
Architecture - MemNN
Training
Results
4 NTM code induction experiments
17/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Architecture - MemNN
IGOR
The memory network consists of a memory m and 4 learned
components
1 I: (input feature map) - converts the incoming input to the
internal feature representation.
2 G: (generalization) - updates old memories given the new
input.
3 O: (output feature map) - produces a new output, given the
new input and the current memory state.
4 R: (response) - converts the output into the response
format desired.
18/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Architecture - MemNN
Model Flow
The core of inference lies in the O and R modules. The O
module produces output features by finding k supporting
memories given x.
For k = 1, the highest scoring supporting memory is
retrieved: o1 = O1(x, m) = argmax
i=1,...,N
sO(x, mi).
For k = 2, a second supporting memory is additionally
computed: o2 = O2(x, m) = argmax
i=1,...,N
sO([x, mo1
], mi).
In the single-word response setting, where W is the set of
all words in the dict., then r = argmax
w∈W
sR([x, mo1
, mo2
], w).
19/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Training
Max-margin, SGD
Supporting sentences annotations are available as part of the training
data. Thus, scoring functions are trained by minimizing a margin
ranking loss over the model parameters UO and UR using SGD.
Figure: For a given question x with true response r and supporting
sentences mO1
, mO2
(i.e., k = 2), this expression is minimized over
parameters UO and UR:
where ¯f, ¯f and ¯r are all other choices than the correct labels, and γ is the margin.
20/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Results
large-scale QA
Figure: Results on a QA dataset with 14M statements.
Hashing techniques for efficient memory scoring
Idea: hash the inputs I(x) into buckets, and score memories mi lying
in the same buckets only.
20/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Results
large-scale QA
Figure: Results on a QA dataset with 14M statements.
Hashing techniques for efficient memory scoring
word hash: a bucket per dict. word, containing all sentences that
contain this word.
20/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Results
large-scale QA
Figure: Results on a QA dataset with 14M statements.
Hashing techniques for efficient memory scoring
cluster hash: Run K-means to cluster word vectors (UO)i , giving K
buckets. Hash sentence to all buckets in which its words belong.
21/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Results
simulation QA
Figure: The task is a simple simulation of 4 characters, 3 objects and
5 rooms - with characters moving around, picking up and dropping
objects. (Similar to the 10k dataset of MemN2N)
22/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Results
simulation QA - sample test rseults
Figure: Sample test set predictions (in red) for the simulation in the
setting of word-based input and where answers are sentences and an
LSTM is used as the R component of the MemNN.
23/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Plan
1 Introduction
Intuition and resemblance to human cognition
How does it look like?
2 QA Experiments, End-to-End
Architecture - MemN2N
Training
Baselines and Results
3 QA Experiments, Strongly Supervised
Architecture - MemNN
Training
Results
4 NTM code induction experiments
24/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Architecture
More sophisticated memory “controller”.
Figure: Content-addressing is implemented by learning similarity
measures, analogous to MemNN. Additionally, the controller offers
simulation of location-based addressing by implementing a rotational
shift of a weighting.
25/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
NTM learns a Copy task
Figure: The networks were trained to copy sequences of eight bit
random vectors, where the sequence lengths were randomized
between 1 and 20. NTM with LSTM controller was used.
25/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
... on which LSTM fails
Figure: The networks were trained to copy sequences of eight bit
random vectors, where the sequence lengths were randomized
between 1 and 20. NTM with LSTM controller was used.
26/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Summary
Intuition of memory networks vs standard neural network
models.
MemNN is successful through strongly-supervised learning
in QA tasks
MemN2N is used with more realistic end-to-end training,
and is competent enough on the same tasks.
NTMs can learn simple memory copy and recall tasks from
input-memory, output-memory training data.
26/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
Summary
Intuition of memory networks vs standard neural network
models.
MemNN is successful through strongly-supervised learning
in QA tasks
MemN2N is used with more realistic end-to-end training,
and is competent enough on the same tasks.
NTMs can learn simple memory copy and recall tasks from
input-memory, output-memory training data.
Thank you!
27/27
Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary
References
End-To-End Memory Networks, Sainbayar Sukhbaatar,
Arthur Szlam, Jason Weston, Rob Fergus, 2015.
Memory Networks, Jason Weston, Sumit Chopra, Antoine
Bordes, 2015
Neural Turing Machines, Alex Graves, Greg Wayne, Ivo
Danihelka, 2014
Deep learning at Oxford 2015, Nando de Freitas

Contenu connexe

Tendances

Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
 
Issues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsIssues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsDEBJYOTI PAUL
 
Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)hyunsung lee
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessMLAI2
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovBhaskar Mitra
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 

Tendances (20)

Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
Issues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsIssues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural Nets
 
Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Deeplearning NLP
Deeplearning NLPDeeplearning NLP
Deeplearning NLP
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 

En vedette

Neural turing machine
Neural turing machineNeural turing machine
Neural turing machinetm_2648
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing MachinesIlya Kuzovkin
 
Neural Turing Machine Tutorial
Neural Turing Machine TutorialNeural Turing Machine Tutorial
Neural Turing Machine TutorialMark Chang
 
Neural Network と Universality について
Neural Network と Universality について  Neural Network と Universality について
Neural Network と Universality について Kato Yuzuru
 
Differentiable neural conputers
Differentiable neural conputersDifferentiable neural conputers
Differentiable neural conputersnaoto moriyama
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing MachinesKato Yuzuru
 
ニューラルチューリングマシン入門
ニューラルチューリングマシン入門ニューラルチューリングマシン入門
ニューラルチューリングマシン入門naoto moriyama
 

En vedette (7)

Neural turing machine
Neural turing machineNeural turing machine
Neural turing machine
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
 
Neural Turing Machine Tutorial
Neural Turing Machine TutorialNeural Turing Machine Tutorial
Neural Turing Machine Tutorial
 
Neural Network と Universality について
Neural Network と Universality について  Neural Network と Universality について
Neural Network と Universality について
 
Differentiable neural conputers
Differentiable neural conputersDifferentiable neural conputers
Differentiable neural conputers
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
 
ニューラルチューリングマシン入門
ニューラルチューリングマシン入門ニューラルチューリングマシン入門
ニューラルチューリングマシン入門
 

Similaire à Memory Networks, Neural Turing Machines, and Question Answering

Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization MLAI2
 
Comparing model coverage and code coverage in Model Driven testing: an explor...
Comparing model coverage and code coverage in Model Driven testing: an explor...Comparing model coverage and code coverage in Model Driven testing: an explor...
Comparing model coverage and code coverage in Model Driven testing: an explor...REvERSE University of Naples Federico II
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedSEMINARGROOT
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Ryo Takahashi
 
Chain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxChain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxatharva553835
 
Model-based Testing Principles
Model-based Testing PrinciplesModel-based Testing Principles
Model-based Testing PrinciplesHenry Muccini
 
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...Francisco (Paco) Florez-Revuelta
 
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程littleuniverse24
 
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...Rashid Mijumbi
 
IRJET- Chatbot Using Gated End-to-End Memory Networks
IRJET-  	  Chatbot Using Gated End-to-End Memory NetworksIRJET-  	  Chatbot Using Gated End-to-End Memory Networks
IRJET- Chatbot Using Gated End-to-End Memory NetworksIRJET Journal
 
Brian Klumpe Unification of Producer Consumer Key Pairs
Brian Klumpe Unification of Producer Consumer Key PairsBrian Klumpe Unification of Producer Consumer Key Pairs
Brian Klumpe Unification of Producer Consumer Key PairsBrian_Klumpe
 
Automock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationAutomock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationSabrina Souto
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 
Easily Trainable Neural Network Using TransferLearning
Easily Trainable Neural Network Using TransferLearningEasily Trainable Neural Network Using TransferLearning
Easily Trainable Neural Network Using TransferLearningIRJET Journal
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
 

Similaire à Memory Networks, Neural Turing Machines, and Question Answering (20)

Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization
 
Comparing model coverage and code coverage in Model Driven testing: an explor...
Comparing model coverage and code coverage in Model Driven testing: an explor...Comparing model coverage and code coverage in Model Driven testing: an explor...
Comparing model coverage and code coverage in Model Driven testing: an explor...
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
 
Chain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxChain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptx
 
Model-based Testing Principles
Model-based Testing PrinciplesModel-based Testing Principles
Model-based Testing Principles
 
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
 
Scolari's ICCD17 Talk
Scolari's ICCD17 TalkScolari's ICCD17 Talk
Scolari's ICCD17 Talk
 
Practical ML
Practical MLPractical ML
Practical ML
 
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
 
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
A Connectionist Approach to Dynamic Resource Management for Virtualised Netwo...
 
IRJET- Chatbot Using Gated End-to-End Memory Networks
IRJET-  	  Chatbot Using Gated End-to-End Memory NetworksIRJET-  	  Chatbot Using Gated End-to-End Memory Networks
IRJET- Chatbot Using Gated End-to-End Memory Networks
 
Brian Klumpe Unification of Producer Consumer Key Pairs
Brian Klumpe Unification of Producer Consumer Key PairsBrian Klumpe Unification of Producer Consumer Key Pairs
Brian Klumpe Unification of Producer Consumer Key Pairs
 
Automock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationAutomock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code Generation
 
rooter.pdf
rooter.pdfrooter.pdf
rooter.pdf
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
Chapter11a
Chapter11aChapter11a
Chapter11a
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Easily Trainable Neural Network Using TransferLearning
Easily Trainable Neural Network Using TransferLearningEasily Trainable Neural Network Using TransferLearning
Easily Trainable Neural Network Using TransferLearning
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 

Dernier

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 

Dernier (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 

Memory Networks, Neural Turing Machines, and Question Answering

  • 1. 1/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Memory Networks, Neural Turing Machines, and Question Answering Akram El-Korashy1 1Max Planck Institute for Informatics November 30, 2015 Deep Learning Seminar. Papers by Weston et al. (ICLR2015), Graves et al. (2014), and Sukhbaatar et al. (2015)
  • 2. 2/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Outline 1 Introduction Intuition and resemblance to human cognition How does it look like? 2 QA Experiments, End-to-End Architecture - MemN2N Training Baselines and Results 3 QA Experiments, Strongly Supervised Architecture - MemNN Training Results 4 NTM code induction experiments
  • 3. 3/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Intuition and resemblance to human cognition Why memory? Human’s working memory is a capacity for short-term storage of information and its rule-based manipulation. . . Therefore, an NTM1resembles a working memory system, as it is designed to solve tasks that require the application of approximate rules to “rapidly-created variables”. 1 Neural Turing Machine. I will use it interchangeably with Memory Networks, depending on which paper I am citing.
  • 4. 4/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Intuition and resemblance to human cognition Why memory? Why not RNNs and LSTM? The memory in these models is the state of the network, which is latent (i.e., hidden; no exlpicit access) and inherently unstable over long timescales. [Sukhbaatar2015] Unlike a standard network, NTM interacts with a memory matrix using selective read and write operations that can focus on (almost) a single memory location. [Graves2014]
  • 5. 5/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Intuition and resemblance to human cognition Why memory networks? How about attention models with RNN encoders/decoders? The memory model is indeed analogous to the attention mechanisms introduced for machine translation.
  • 6. 5/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Intuition and resemblance to human cognition Why memory networks? How about attention models with RNN encoders/decoders? The memory model is indeed analogous to the attention mechanisms introduced for machine translation. Main differences In a memory network model, the query can be made over multiple sentences, unlike machine translation. The memory model makes several hops on the memory before making an output. The network architecture of the memory scoring is a simple linear layer, as opposed to a sophisticated gated architecture in previous work.
  • 7. 6/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Intuition and resemblance to human cognition Why memory? What’s the main usage? Memory as non-compact storage Explicitly update memory slots mi on test time by making use of a “generalization” component that determines “what” is to be stored from input x, and “where” to store it (choosing among the memory slots). Storing stories for Question Answering Given a story (i.e., a sequence of sentences), training of the output component of the memory network can learn scoring functions (i.e., similarity) between query sentences and existing memory slots from previous sentences.
  • 8. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model A memory model that is trained only end-to-end. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 9. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model Trained model takes a set of inputs x1, ..., xn to be stored in the memory, a query q, and outputs an answer a. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 10. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model Each of xi, q, a contains symbols coming from a dictionary with V words. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 11. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model All x is written to memory up to a fixed buffer size, then find a continuous representation for the x and q. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 12. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model The continuous representation is then processed via multiple hops to output a. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 13. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model This allows back-propagation of the error signal through multiple memory accesses back to input during training. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 14. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model A, B, C are embedding matrices (of size d × V) used to convert the input to the d-dimensional vectors mi. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 15. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model A match is computed between u and each memory mi by taking the inner product followed by a softmax. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 16. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model The response vector o from the memory is the weighted sum: o = i pici. Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 17. 7/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary How does it look like? Overview of a memory model The final prediction (answer to the query) is computed with the help of a weight matrix as: ˆa = Softmax(W(o + u)). Figure: A single layer, and a three-layer memory model [Sukhbaatar2015]
  • 18. 8/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Plan 1 Introduction Intuition and resemblance to human cognition How does it look like? 2 QA Experiments, End-to-End Architecture - MemN2N Training Baselines and Results 3 QA Experiments, Strongly Supervised Architecture - MemNN Training Results 4 NTM code induction experiments
  • 19. 9/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Synthetic QA tasks, supporting subset There are a total of 20 different types of tasks that test different forms of reasoning and deduction. Figure: A given QA task consists of a set of statements, followed by a question whose answer is typically a single word. [Sukhbaatar2015]
  • 20. 9/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Synthetic QA tasks, supporting subset Note that for each question, only some subset of the statements contain information needed for the answer, and the others are essentially irrelevant distractors (e.g., the first sentence in the first example). Figure: A given QA task consists of a set of statements, followed by a question whose answer is typically a single word. [Sukhbaatar2015]
  • 21. 9/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Synthetic QA tasks, supporting subset In the Memory Networks of Weston et al., this supporting subset was explicitly indicated to the model during training. Figure: A given QA task consists of a set of statements, followed by a question whose answer is typically a single word. [Sukhbaatar2015]
  • 22. 9/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Synthetic QA tasks, supporting subset In what is called end-to-end training of memory networks, this information is no longer provided. Figure: A given QA task consists of a set of statements, followed by a question whose answer is typically a single word. [Sukhbaatar2015]
  • 23. 9/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Synthetic QA tasks, supporting subset 20 QA tasks. A task is a set of example problems. A problem is a set of I sentences xi where I ≤ 320, a question q and an answer a. Figure: A given QA task consists of a set of statements, followed by a question whose answer is typically a single word. [Sukhbaatar2015]
  • 24. 9/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Synthetic QA tasks, supporting subset The vocabulary is of size V = 177! Two versions of the data are used, one that has 1000 training problems per task, and one with 10,000 per task. Figure: A given QA task consists of a set of statements, followed by a question whose answer is typically a single word. [Sukhbaatar2015]
  • 25. 10/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Architecture - MemN2N Model Architecture K = 3 hops were used. Adjacent weight sharing was used to ease training and reduce the number of parameters. Adjacent weight tying 1 The output embedding of a layer is input to the layer above. (Ak+1 = Ck ) 2 Answer prediction is the same as the final output. (WT = CK ) 3 Question embedding is the same as the input to the first layer. (B = A1)
  • 26. 11/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Architecture - MemN2N Sentence Representation, Temporal Encoding Two different sentence representations: bag-of-words (BoW), and Position Encoding (PE) BoW embeds each words, and sums the resulting vectors, e.g., mi = j Axij . PE encodes the position of the word using a column vector lj where lkj = (1 − j/J) − (k/d)(1 − 2j/J), where J is the number of words in the sentence.
  • 27. 11/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Architecture - MemN2N Sentence Representation, Temporal Encoding Two different sentence representations: bag-of-words (BoW), and Position Encoding (PE) BoW embeds each words, and sums the resulting vectors, e.g., mi = j Axij . PE encodes the position of the word using a column vector lj where lkj = (1 − j/J) − (k/d)(1 − 2j/J), where J is the number of words in the sentence. Temporal Encoding: Modify the memory vector with a special matrix that encodes temporal information. 2 Now, mi = j Axij + TA(i), where TA(i) is the ith row of a special temporal matrix TA. All the T matrices are learned during training. They are subject to the sharing constraints as between A and C. 2 There isn’t enough detail on what constraints this matrix should be subject to, if any.
  • 28. 12/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Training Loss function and learning parameters Embedding Matrices A, B and C, as well as W are jointly learnt. Loss function is a standard cross entropy between ˆa and the true label a. Stochastic gradient descent is used with learning rate of η = 0.01, with annealing.
  • 29. 13/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Training Parameters and Techniques RN: Learning time invariance by injecting random noise to regularize TA LS: Linear start: Remove all softmax except for the answer prediction layer. Apply it back when validation loss stops decreasing. (LS learning rate of η = 0.005 instead of 0.01 for normal training.) LW: Layer-wise, RNN-like weight tying. Otherwise, adjacent weight tying. BoW or PE: sentence representation. joint: training on all 20 tasks jointly vs independently. [Sukhbaatar2015]
  • 30. 14/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Baselines and Results RN: Learning time invariance by injecting random noise to regularize TA Figure: All variants of the end-to-end trained memory model comfortably beat the weakly supervised baseline methods. [Sukhbaatar2015]
  • 31. 14/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Baselines and Results LS: Linear start: Remove all softmax except for the answer prediction layer. Apply it back when validation loss stops decreasing. (LS learning rate of η = 0.005 instead of 0.01 for normal training.) Figure: All variants of the end-to-end trained memory model comfortably beat the weakly supervised baseline methods. [Sukhbaatar2015]
  • 32. 14/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Baselines and Results LW: Layer-wise, RNN-like weight tying. Otherwise, adjacent weight tying. Figure: All variants of the end-to-end trained memory model comfortably beat the weakly supervised baseline methods. [Sukhbaatar2015]
  • 33. 14/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Baselines and Results BoW or PE: sentence representation. Figure: All variants of the end-to-end trained memory model comfortably beat the weakly supervised baseline methods. [Sukhbaatar2015]
  • 34. 14/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Baselines and Results take-home msg: More memory hops give improved performance. Figure: All variants of the end-to-end trained memory model comfortably beat the weakly supervised baseline methods. [Sukhbaatar2015]
  • 35. 14/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Baselines and Results take-home msg: Joint training on various tasks sometimes helps. Figure: All variants of the end-to-end trained memory model comfortably beat the weakly supervised baseline methods. [Sukhbaatar2015]
  • 36. 15/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Baselines and Results Set of Supporting Facts Figure: Instances of successful prediction of the supporting sentences.
  • 37. 16/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Plan 1 Introduction Intuition and resemblance to human cognition How does it look like? 2 QA Experiments, End-to-End Architecture - MemN2N Training Baselines and Results 3 QA Experiments, Strongly Supervised Architecture - MemNN Training Results 4 NTM code induction experiments
  • 38. 17/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Architecture - MemNN IGOR The memory network consists of a memory m and 4 learned components 1 I: (input feature map) - converts the incoming input to the internal feature representation. 2 G: (generalization) - updates old memories given the new input. 3 O: (output feature map) - produces a new output, given the new input and the current memory state. 4 R: (response) - converts the output into the response format desired.
  • 39. 18/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Architecture - MemNN Model Flow The core of inference lies in the O and R modules. The O module produces output features by finding k supporting memories given x. For k = 1, the highest scoring supporting memory is retrieved: o1 = O1(x, m) = argmax i=1,...,N sO(x, mi). For k = 2, a second supporting memory is additionally computed: o2 = O2(x, m) = argmax i=1,...,N sO([x, mo1 ], mi). In the single-word response setting, where W is the set of all words in the dict., then r = argmax w∈W sR([x, mo1 , mo2 ], w).
  • 40. 19/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Training Max-margin, SGD Supporting sentences annotations are available as part of the training data. Thus, scoring functions are trained by minimizing a margin ranking loss over the model parameters UO and UR using SGD. Figure: For a given question x with true response r and supporting sentences mO1 , mO2 (i.e., k = 2), this expression is minimized over parameters UO and UR: where ¯f, ¯f and ¯r are all other choices than the correct labels, and γ is the margin.
  • 41. 20/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Results large-scale QA Figure: Results on a QA dataset with 14M statements. Hashing techniques for efficient memory scoring Idea: hash the inputs I(x) into buckets, and score memories mi lying in the same buckets only.
  • 42. 20/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Results large-scale QA Figure: Results on a QA dataset with 14M statements. Hashing techniques for efficient memory scoring word hash: a bucket per dict. word, containing all sentences that contain this word.
  • 43. 20/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Results large-scale QA Figure: Results on a QA dataset with 14M statements. Hashing techniques for efficient memory scoring cluster hash: Run K-means to cluster word vectors (UO)i , giving K buckets. Hash sentence to all buckets in which its words belong.
  • 44. 21/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Results simulation QA Figure: The task is a simple simulation of 4 characters, 3 objects and 5 rooms - with characters moving around, picking up and dropping objects. (Similar to the 10k dataset of MemN2N)
  • 45. 22/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Results simulation QA - sample test rseults Figure: Sample test set predictions (in red) for the simulation in the setting of word-based input and where answers are sentences and an LSTM is used as the R component of the MemNN.
  • 46. 23/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Plan 1 Introduction Intuition and resemblance to human cognition How does it look like? 2 QA Experiments, End-to-End Architecture - MemN2N Training Baselines and Results 3 QA Experiments, Strongly Supervised Architecture - MemNN Training Results 4 NTM code induction experiments
  • 47. 24/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Architecture More sophisticated memory “controller”. Figure: Content-addressing is implemented by learning similarity measures, analogous to MemNN. Additionally, the controller offers simulation of location-based addressing by implementing a rotational shift of a weighting.
  • 48. 25/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary NTM learns a Copy task Figure: The networks were trained to copy sequences of eight bit random vectors, where the sequence lengths were randomized between 1 and 20. NTM with LSTM controller was used.
  • 49. 25/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary ... on which LSTM fails Figure: The networks were trained to copy sequences of eight bit random vectors, where the sequence lengths were randomized between 1 and 20. NTM with LSTM controller was used.
  • 50. 26/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Summary Intuition of memory networks vs standard neural network models. MemNN is successful through strongly-supervised learning in QA tasks MemN2N is used with more realistic end-to-end training, and is competent enough on the same tasks. NTMs can learn simple memory copy and recall tasks from input-memory, output-memory training data.
  • 51. 26/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary Summary Intuition of memory networks vs standard neural network models. MemNN is successful through strongly-supervised learning in QA tasks MemN2N is used with more realistic end-to-end training, and is competent enough on the same tasks. NTMs can learn simple memory copy and recall tasks from input-memory, output-memory training data. Thank you!
  • 52. 27/27 Introduction QA Experiments, End-to-End QA Experiments, Strongly Supervised NTM code induction experiments Summary References End-To-End Memory Networks, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, 2015. Memory Networks, Jason Weston, Sumit Chopra, Antoine Bordes, 2015 Neural Turing Machines, Alex Graves, Greg Wayne, Ivo Danihelka, 2014 Deep learning at Oxford 2015, Nando de Freitas