54. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 54/89
Neural Language Model - 2001Neural Language Model - 2001
softmax
tanh
. . . . . .. . .
. . . . . .
. . . . . .
across words
most computation here
index for index for index for
shared parameters
Matrix
in
look−up
Table
. . .
C
C
wt−1wt−2
C(wt−2) C(wt−1)C(wt−n+1)
wt−n+1
i-th output = P(wt = i |context)
Bengio et al., A Neural Probabilistic Language Model, NIPS
Proceedings, 2001; Journal of Machine Learning Research
(JMLR), 2003
55. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 55/89
Distributed RepresentationsDistributed Representations
To embed syntax or semantic information in distributed representation (in aTo embed syntax or semantic information in distributed representation (in a
vector)vector)
Why and where can we nd such information?Why and where can we nd such information?
What de ne the meaning of a word?What de ne the meaning of a word?
Context!Context!
approachapproach
Word Embeddings
Character Embeddings
Contextualized Word Embeddings
60. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 60/89
Character EmbeddingsCharacter Embeddings
Capture the intra-word morphological and shape information can be useful
parts-of-speech (POS) tagging
named-entity recognition (NER)
Santos and Guimaraes [31] applied character-level representations, along with
word embeddings for NER, achieving state-of-the-art results in Portuguese and
Spanish corpora.
AdvantageAdvantage
out-of-vocabulary (OOV) words
61. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 61/89
Contextualized Word EmbeddingsContextualized Word Embeddings
Disadvantage of global word embedding: Word2Vec and GloveDisadvantage of global word embedding: Word2Vec and Glove
⼀詞多義
⼀詞多義⼀詞多義
1. “The bank will not be accepting cash on Saturdays”
2. “The river over owed the bank.”
Deep contextual word embeddingsDeep contextual word embeddings
Embedding from Language Model (ELMo): extracts the intermediate layer
representations from the biLM
Pre-trained language modelPre-trained language model
Embedding from Language Model (ELMo)
Generative pre-training (GPT)
66. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 66/89
ApplicationsApplications
Convolutional kernelConvolutional kernel speci c n-gram feature extractorsspeci c n-gram feature extractors→→
TasksTasks
sentence classi cation
sentiment classi cation
subjectivity classi cation
question type classi cation
Time-delay neural network (TDNN)Time-delay neural network (TDNN)
Convolutions are performed across all windows throughout the sentence at the
same time
Dynamic multi-pooling CNN (DMCNN)Dynamic multi-pooling CNN (DMCNN)
dynamic k-max pooling
67. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 67/89
Recurrent Neural Network (RNN) in NLPRecurrent Neural Network (RNN) in NLP
Idea and purposeIdea and purpose
Processing sequential information
Encode a range of sequential information into a x-sized vector
Output is depends on previous reuslts and current input
ApplicationsApplications
Language model
Machine translation
Speech recognition
Image caption
74. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 74/89
RNN for word-level classi cationRNN for word-level classi cation
Bidirectional LSTM for NER
[hb1 ;hf 1]
This is a book
[hb2 ;hf 2] [hb3 ;hf 3] [hb 4 ;hf 4]
RNN for sentence-level classi cationRNN for sentence-level classi cation
LSTM for sentiment classi cation
84. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 84/89
Recursive Neural Network in NLPRecursive Neural Network in NLP
IdeaIdea
a natural way to model sequences
language exhibits a natural recursive structure
a compositional function on the representations of phrases or words to compute
the representation of a higher-level phrase
85. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 85/89
Unsupervised sentence representation learningUnsupervised sentence representation learning
sentence encoders
seq2seq model
the encoder could be seen as a generic feature extractor
Ref. Sequence to sequence model: Introduction and concepts
(https://towardsdatascience.com/sequence-to-sequence-model-
introduction-and-concepts-44d9b41cd42d)
89. 2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 89/89
Thank you for attentionThank you for attention
Q & AQ & A
ReferenceReference
Recent Trends in Deep Learning Based Natural Language Processing (https://arxiv.o
Deep Learning for NLP: An Overview of Recent Trends (https://medium.com/dair-ai/
overview-of-recent-trends-d0d8f40a776d)
15年来,⾃然语⾔处理发展史上的8⼤⾥程碑(https://zhuanlan.zhihu.com/p/47239
獨家| ⼀⽂讀懂⾃然語⾔處理NLP(附學習資料)
(https://tw.saowen.com/a/0c1d7d1765b999218654702c1e4d2d0e71c5e138141e