SlideShare a Scribd company logo
1 of 36
Download to read offline
(Deep) Neural Network& Text Mining 
Piji Li 
lipiji.pz@gmail.com
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 2 
lipiji.pz@gmail.comOutline
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 3 
lipiji.pz@gmail.comOutline
9/3/2014 4 
lipiji.pz@gmail.comBP 
D.E. Rumelhart, G.E. Hinton, R.J. Williams 
Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536 
Solved learning problem 
Biological system 
… 
Hard to train (non-convex, tricks) 
Hard to do theoretical analysis 
Small training sets... 
1986
9/3/2014 5 
lipiji.pz@gmail.comDBN 
Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets."Neuralcomputation18,no.7(2006):1527-1554. 
Hinton,GeoffreyE.,andRuslanR.Salakhutdinov. "Reducingthedimensionalityofdatawithneuralnetworks."Science313,no.5786(2006):504-507. 
2006 
•Unsupervised layer-wised pre-training 
•Supervised fine-tuning 
•Feature learning & distributed representations 
•Large scale datasets 
•Tricks (learning rate, mini-batch size, momentum, etc.) 
•https://github.com/lipiji/PG_DEEP
9/3/2014 6 
lipiji.pz@gmail.comCD-DNN-HMM 
Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42. 
Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward 
2011 
PhD candidate of Prof. Hinton, intern at MSR 
GPU
9/3/2014 7 
lipiji.pz@gmail.comImageNet Classification 
0 
0.05 
0.1 
0.15 
0.2 
0.25 
0.3 
0.35 
0.4 
ImageNet Classification Error 
2012 
2013 
2014 
Deeper 
Network in Network 
Deep 
DNN 
First Blood 
http://image-net.org/ 
•1000categories and 1.2million training images 
Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
9/3/2014 8 
lipiji.pz@gmail.comNowadays 
People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng 
Industry: Google, Baidu, Facebook, IBM 
Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING 
Startup Companies using (Deep) Machine Learning:
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 9 
lipiji.pz@gmail.comOutline
•Distributed Representation 
-A Neural Probabilistic Language Model 
-Word2Vec 
•Application 
-Semantic Hierarchies of Words 
-Machine Translation 
9/3/2014 10 
lipiji.pz@gmail.comNN & Words 
V(king) –V(queen) + V(woman) ≈ V(man)
•Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155. 
9/3/2014 11 
lipiji.pz@gmail.comNeural Probabilistic Language Model 
H, d 
U, b 
W, b 
|V| 
concatenate 
-Maximizes the penalized log-likelihood: 
-Sofmax 
-where 
-Training: BP 
Better than n-gram model 
1, Word vector; 
2, Need not soothing, p(w|context) ~(0,1) 
Time-Consuming
•Mikolov,Tomas,KaiChen,GregCorrado,andJeffreyDean. "Efficientestimationofwordrepresentationsinvectorspace."ICLR(2013). 
-Largeimprovementsinaccuracy,lowercomputationalcost. 
-Ittakeslessthanadaytotrainfrom1.6billionwordsdataset. 
9/3/2014 12 
lipiji.pz@gmail.comWord2Vec
•Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. 
•Mnih,Andriy,andGeoffreyE.Hinton."Ascalablehierarchicaldistributedlanguagemodel."InAdvancesinneuralinformationprocessingsystems,pp. 1081-1088.2009. 
9/3/2014 13 
lipiji.pz@gmail.comWord2Vec -Hierarchical Softmax 
5 
3 
2 
2 
1 
휃2 
휃1 
푤3 
푤2 
푤1 
•Hierarchical Strategies: 
O(|V|)O(log|V|) 
•Word2Vec: 
Huffman Tree -short codes to frequent words 
Time-Consuming
9/3/2014 14 
lipiji.pz@gmail.comWord2vec -CBOW 
•Log-likelihood 
Blog: http://blog.csdn.net/itplus/article/details/37969979 
•Condition Probability 
•SGD 
Hierarchical Softmax
9/3/2014 15 
lipiji.pz@gmail.comWord2vec –Skip-gram
9/3/2014 16 
lipiji.pz@gmail.comCBOW vs Skip-gram 
Semantic Relation 
Syntactic Relation
9/3/2014 17 
lipiji.pz@gmail.comWord2Vec -Negative Sampling 
Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."NIPS(2013). 
CBOW&Skip-gram 
Blog: http://blog.csdn.net/itplus/article/details/37969979 
Phrase Skip-Gram Results 
Negative Subsampling: 
Subsampling:
9/3/2014 18 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
9/3/2014 19 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
•Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013). 
9/3/2014 20 
lipiji.pz@gmail.comMachine Translation 
Translation Matrix 
English (left) Spanish (right). 
•Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
9/3/2014 21 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
9/3/2014 22 
lipiji.pz@gmail.comSemantic Hierarchies 
•Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014. 
hypernym–hyponym 
More Relations? 
Clustering y -x, and then for each cluster:
•Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper 
9/3/2014 23 
lipiji.pz@gmail.comRelation Classification 
Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2” 
Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN; 
4, Concatenate features and feed into softmax classifier 
SENNA
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013). 
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
9/3/2014 24 
lipiji.pz@gmail.comNN & Sentences 
PhilBlunsom 
Oxford 
http://www.cs.ox.ac.uk/people/phil.blunsom/
•Best Results in Computer Vision (LeNet5, ImageNet 2012~now) 
•Shared Weight: Convolutional Filter Feature Map 
•* Invariance: pooling (mean-, max-, stochastic) 
9/3/2014 25 
lipiji.pz@gmail.comConvolutional Neural Networks 
http://yann.lecun.com/exdb/lenet/ 
C1: (5*5+1)*6=156 parameters; 
156*(28*28)=122,304 connections 
Convolutional 
Windows in Sentences
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 
9/3/2014 26 
lipiji.pz@gmail.comCNN for Modelling Sentences 
•Supervisor depends on tasks 
•Dynamic k-Max Pooling 
•Folding 
-Sums every two rows 
•Multiple Feature Maps 
-Different features 
-Word order, n-gram 
-Sentence Feature 
-Many tasks 
•Word vector: learning
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 
9/3/2014 27 
lipiji.pz@gmail.comCNN for Sentence Classification 
New: Multi-Channel 
A model with two sets of word vectors. Each set of vectors is treated 
as a ‘channel’ and each filter is applied to both channels. 
word2vec
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 
9/3/2014 28 
lipiji.pz@gmail.comMultilingual Models 
Sentences 
Documents 
Word vector: learning 
Objection: 
CVM: SUM or
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
9/3/2014 29 
lipiji.pz@gmail.comWord2Vec Extension 
Sentiment Analysisand Document Classification
9/3/2014 30 
lipiji.pz@gmail.comSENNA 
Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011) 
Word embeddings 
ConvNet
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
•Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014). 
9/3/2014 31 
lipiji.pz@gmail.comNN & Documents 
Layer1 
Layer4 
Layer5 
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014 
How about Text? 
-Word salience? –word/entity/topic extraction 
-Sentence salience?
9/3/2014 32 
lipiji.pz@gmail.comWeb Search | Learning to Rank 
•Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLarryHeck. "Learningdeepstructuredsemanticmodelsforwebsearchusingclickthroughdata." CIKM,2013.
9/3/2014 33 
lipiji.pz@gmail.comNN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012. 
•Larochelle,Hugo,andStanislasLauly."Aneuralautoregressivetopicmodel." NIPS.2012.
9/3/2014 34 
lipiji.pz@gmail.comHybrid NN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012.
9/3/2014 35 
lipiji.pz@gmail.comTuning 
•#layers,#hiddenunits 
•Learningrate 
•Momentum 
•Weightdecay 
•Mini-bachsize 
•… 
TO BE A GOOD TUNER
9/3/2014 36 
lipiji.pz@gmail.com 
Thanks! 
Q&A

More Related Content

What's hot

Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distancesGanesh Borle
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modelingSerhii Havrylov
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for LexicographyLeiden University
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 

What's hot (20)

What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 

Similar to (Deep) Neural Networks在 NLP 和 Text Mining 总结

Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?Dominik Seisser
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)hen_drik
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
2015-SemEval2015_poster
2015-SemEval2015_poster2015-SemEval2015_poster
2015-SemEval2015_posterhpcosta
 
Towards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksTowards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksDaniel Kershaw
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationYunchao He
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataHang Dong
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Robert Munro
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 

Similar to (Deep) Neural Networks在 NLP 和 Text Mining 总结 (20)

Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
2015-SemEval2015_poster
2015-SemEval2015_poster2015-SemEval2015_poster
2015-SemEval2015_poster
 
Towards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksTowards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social Networks
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
Dcnn for text
Dcnn for textDcnn for text
Dcnn for text
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning
 

Recently uploaded

Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...APNIC
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024Jan Löffler
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxnaveenithkrishnan
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSlesteraporado16
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteMavein
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilitiesalihassaah1994
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSedrianrheine
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdfShreedeep Rayamajhi
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsRoxana Stingu
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Shubham Pant
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpressssuser166378
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfmchristianalwyn
 

Recently uploaded (12)

Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptx
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a Website
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilities
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpress
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
 

(Deep) Neural Networks在 NLP 和 Text Mining 总结

  • 1. (Deep) Neural Network& Text Mining Piji Li lipiji.pz@gmail.com
  • 2. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 2 lipiji.pz@gmail.comOutline
  • 3. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 3 lipiji.pz@gmail.comOutline
  • 4. 9/3/2014 4 lipiji.pz@gmail.comBP D.E. Rumelhart, G.E. Hinton, R.J. Williams Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536 Solved learning problem Biological system … Hard to train (non-convex, tricks) Hard to do theoretical analysis Small training sets... 1986
  • 5. 9/3/2014 5 lipiji.pz@gmail.comDBN Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets."Neuralcomputation18,no.7(2006):1527-1554. Hinton,GeoffreyE.,andRuslanR.Salakhutdinov. "Reducingthedimensionalityofdatawithneuralnetworks."Science313,no.5786(2006):504-507. 2006 •Unsupervised layer-wised pre-training •Supervised fine-tuning •Feature learning & distributed representations •Large scale datasets •Tricks (learning rate, mini-batch size, momentum, etc.) •https://github.com/lipiji/PG_DEEP
  • 6. 9/3/2014 6 lipiji.pz@gmail.comCD-DNN-HMM Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42. Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward 2011 PhD candidate of Prof. Hinton, intern at MSR GPU
  • 7. 9/3/2014 7 lipiji.pz@gmail.comImageNet Classification 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 ImageNet Classification Error 2012 2013 2014 Deeper Network in Network Deep DNN First Blood http://image-net.org/ •1000categories and 1.2million training images Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
  • 8. 9/3/2014 8 lipiji.pz@gmail.comNowadays People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng Industry: Google, Baidu, Facebook, IBM Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING Startup Companies using (Deep) Machine Learning:
  • 9. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 9 lipiji.pz@gmail.comOutline
  • 10. •Distributed Representation -A Neural Probabilistic Language Model -Word2Vec •Application -Semantic Hierarchies of Words -Machine Translation 9/3/2014 10 lipiji.pz@gmail.comNN & Words V(king) –V(queen) + V(woman) ≈ V(man)
  • 11. •Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155. 9/3/2014 11 lipiji.pz@gmail.comNeural Probabilistic Language Model H, d U, b W, b |V| concatenate -Maximizes the penalized log-likelihood: -Sofmax -where -Training: BP Better than n-gram model 1, Word vector; 2, Need not soothing, p(w|context) ~(0,1) Time-Consuming
  • 13. •Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. •Mnih,Andriy,andGeoffreyE.Hinton."Ascalablehierarchicaldistributedlanguagemodel."InAdvancesinneuralinformationprocessingsystems,pp. 1081-1088.2009. 9/3/2014 13 lipiji.pz@gmail.comWord2Vec -Hierarchical Softmax 5 3 2 2 1 휃2 휃1 푤3 푤2 푤1 •Hierarchical Strategies: O(|V|)O(log|V|) •Word2Vec: Huffman Tree -short codes to frequent words Time-Consuming
  • 14. 9/3/2014 14 lipiji.pz@gmail.comWord2vec -CBOW •Log-likelihood Blog: http://blog.csdn.net/itplus/article/details/37969979 •Condition Probability •SGD Hierarchical Softmax
  • 16. 9/3/2014 16 lipiji.pz@gmail.comCBOW vs Skip-gram Semantic Relation Syntactic Relation
  • 17. 9/3/2014 17 lipiji.pz@gmail.comWord2Vec -Negative Sampling Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."NIPS(2013). CBOW&Skip-gram Blog: http://blog.csdn.net/itplus/article/details/37969979 Phrase Skip-Gram Results Negative Subsampling: Subsampling:
  • 18. 9/3/2014 18 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 19. 9/3/2014 19 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 20. •Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013). 9/3/2014 20 lipiji.pz@gmail.comMachine Translation Translation Matrix English (left) Spanish (right). •Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
  • 21. 9/3/2014 21 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 22. 9/3/2014 22 lipiji.pz@gmail.comSemantic Hierarchies •Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014. hypernym–hyponym More Relations? Clustering y -x, and then for each cluster:
  • 23. •Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper 9/3/2014 23 lipiji.pz@gmail.comRelation Classification Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2” Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN; 4, Concatenate features and feed into softmax classifier SENNA
  • 24. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013). •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 24 lipiji.pz@gmail.comNN & Sentences PhilBlunsom Oxford http://www.cs.ox.ac.uk/people/phil.blunsom/
  • 25. •Best Results in Computer Vision (LeNet5, ImageNet 2012~now) •Shared Weight: Convolutional Filter Feature Map •* Invariance: pooling (mean-, max-, stochastic) 9/3/2014 25 lipiji.pz@gmail.comConvolutional Neural Networks http://yann.lecun.com/exdb/lenet/ C1: (5*5+1)*6=156 parameters; 156*(28*28)=122,304 connections Convolutional Windows in Sentences
  • 26. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 9/3/2014 26 lipiji.pz@gmail.comCNN for Modelling Sentences •Supervisor depends on tasks •Dynamic k-Max Pooling •Folding -Sums every two rows •Multiple Feature Maps -Different features -Word order, n-gram -Sentence Feature -Many tasks •Word vector: learning
  • 27. •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 9/3/2014 27 lipiji.pz@gmail.comCNN for Sentence Classification New: Multi-Channel A model with two sets of word vectors. Each set of vectors is treated as a ‘channel’ and each filter is applied to both channels. word2vec
  • 28. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 9/3/2014 28 lipiji.pz@gmail.comMultilingual Models Sentences Documents Word vector: learning Objection: CVM: SUM or
  • 29. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 29 lipiji.pz@gmail.comWord2Vec Extension Sentiment Analysisand Document Classification
  • 30. 9/3/2014 30 lipiji.pz@gmail.comSENNA Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011) Word embeddings ConvNet
  • 31. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). •Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014). 9/3/2014 31 lipiji.pz@gmail.comNN & Documents Layer1 Layer4 Layer5 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014 How about Text? -Word salience? –word/entity/topic extraction -Sentence salience?
  • 32. 9/3/2014 32 lipiji.pz@gmail.comWeb Search | Learning to Rank •Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLarryHeck. "Learningdeepstructuredsemanticmodelsforwebsearchusingclickthroughdata." CIKM,2013.
  • 33. 9/3/2014 33 lipiji.pz@gmail.comNN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012. •Larochelle,Hugo,andStanislasLauly."Aneuralautoregressivetopicmodel." NIPS.2012.
  • 34. 9/3/2014 34 lipiji.pz@gmail.comHybrid NN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012.
  • 35. 9/3/2014 35 lipiji.pz@gmail.comTuning •#layers,#hiddenunits •Learningrate •Momentum •Weightdecay •Mini-bachsize •… TO BE A GOOD TUNER