SlideShare a Scribd company logo
1 of 156
Download to read offline
Lun-Wei Ku
NLPSA, Academia Sinica
無所不在的自然語言處理—
基礎概念、技術與工具介紹
Speaker
Lecturer: Lun-Wei Ku
Currently: Assistant Research Fellow, IIS, Academia Sinica
Adjunct Assistant Professor, NCTU
• Working on NLP and Sentiment Analysis
• Running NLPSA Lab
:http://www.lunweiku.com/
:http://academiasinicanlplab.github.io/
• Currently On-going Projects:
– Graph Embedding, Emotion Enabled Dialog System, Cross-lingual Text
Suggestion, Proactive Dialog Generation from Images and Texts
2
Outline
9:30 - 10:30 什麼是自然語言處理
10:30 - 10:50 茶點時間
10:50 - 12:30 中英文文本處理相關工具與資源介紹
12:30 - 13:20 午餐
13:20 - 15:00 自然語言處理於網路與社群媒體的挑戰
15:00 - 15:20 茶點時間
15:20 - 17:00 自然語言處理發展趨勢與業界應用
3
Section 1
什麼是自然語言處理
page 4
自然語言
• 相對於機器語言
• 人類使用以溝通之語言
page 5
What is Natural Language
Processing?
• Natural language processing (NLP) is a field of computer
science, artificial intelligence and computational
linguistics concerned with the interactions
between computers and human (natural) languages, and, in
particular, concerned with programming computers to
fruitfully process large natural language corpora. Challenges in
natural language processing frequently involve natural
language understanding, natural language
generation (frequently from formal, machine-readable logical
forms), connecting language and machine perception, dialog
systems, or some combination thereof. (Wikipedia)
page 6
甚麼是自然語言處理
• 自然語言處理(英語:Natural Language
Processing,簡稱NLP)是人工智慧和語言
學領域的分支學科。此領域探討如何處理
及運用自然語言;自然語言認知則是指讓
電腦「懂」人類的語言。
• 自然語言生成系統把計算機數據轉化為自
然語言。自然語言理解系統把自然語言轉
化為計算機程序更易於處理的形式。
(Wikipedia)
page 7
自然語言處理
• 是一個AI-complete的問題
page 8
Domain 範疇 (1)
• Biomedical
• Cognitive Modeling and
Psycholinguistics
• Dialogue and Interactive Systems
• Discourse and Pragmatics
• Generation and Summarization
• Information Extraction, Retrieval,
Question Answering, Document
Analysis and NLP Applications
• Machine Learning
• Machine Translation
page 9
• Multidisciplinary
• Multilinguality
• Phonology, Morphology and Word
Segmentation
• Resources and Evaluation
• Semantics
• Sentiment Analysis and Opinion
Mining
• Social Media
• Speech
• Tagging, Chunking, Syntax and
Parsing
• Vision, Robotics and Grounding
Domain 範疇 (2)
• 文本朗讀(Text to speech)/語音合成(Speech
synthesis)
• 語音識別(Speech recognition)
• 中文自動分詞(Chinese word segmentation)
• 詞性標註(Part-of-speech tagging)
• 句法分析(Parsing)
• 自然語言生成(Natural language generation)
• 文本分類(Text categorization)
• 信息檢索(Information retrieval)
• 信息抽取(Information extraction)
• 文字校對(Text-proofing)
• 問答系統(Question answering)
• 機器翻譯(Machine translation)
• 自動摘要(Automatic summarization)
• 文字蘊涵(Textual entailment)
page 10
Applications (1)
• IBM Watson: Jeopardy
https://www.youtube.com/watch?v=WFR3lOm_
xhE
• Google Translate/Google小姐
page 11
Applications (2)
• Spam filtering <-> Ads pushing
– Google AdSense and so many others
• Spelling Correction, Grammar
– Grammarly- free grammar checker:
https://www.grammarly.com/
– duoLinguo https://www.duolingo.com/
– 批改網 https://www.pigai.org/
– …
page 12
Applications (3)
• Paper Generator
– Mathgen http://thatsmathematics.com/mathgen/
• Poem Generator
《秋蟲的聲音》
– 幸運將要投奔你的門上的時候
– 秋蟲的聲音也沒有
– 你的眼睛的誘惑
– 在天空中飛動
– 像人家把門關了幾天吧
– 我一個迷人的容貌
– 有時候不必再有一個太陽
– 把大地照成一顆星球
page 13
Applications (4)
• Problem Solver
– Math solver: https://www.cymath.com/
Step by step, NLP + others (graph, formula, …)
page 14
Applications (5)
• AI doctor
– IBM Watson Health
• Optimize performance
• Engaged cunsumers
• Enable effective care
• Manage population health
– Why is AI doctor related
to NLP?
• MedNLP: medical records, communication…
page 15
Applications (5)
• Summarization
– 最佳示範:谷阿莫 *blog.investis.com
• Sentiment/Opinion/Review
• Social Media/Network application
– Full of texts!
*techxb.com
page 16
Application: Multi-modal NLP
• Captioning
page 17
Application: Multi-modal NLP
• Story Telling
page 18
Other Close Disciplines
• Artificial Intelligence (AI)
• Information Retrieval (IR)
• Machine Learning (ML)
• Human Computer Interaction (HCI)
page 19
NLP and AI
• NLP takes care of the input/output of
unstructural information for AI applications.
• AI applications are expected to be write/speak
like people.
• NLP is getting more and more important in AI.
• However, NLP is challenging.
page 20
NLP and IR
• NLP borrows some concepts from IR,
especially weighting scheme of words.
• For IR, efficiency is very important. Some
time limited NLP tasks will also incorporate
ideas of IR to save time, e.g., clustering/offline
preprocessing.
page 21
NLP and ML
• In the past, NLP techniques utilized a lot of
linguistic knowledge in the form of rules or
probability.
• NLP uses a lot of ML/DL techniques
nowadays.
page 22
NLP and HCI
• (writing or speaking) Language is a way for
computers to communicate with people.
• Representing information and utilizing them in
an appropriate way can mitigate the errors
people may sense.
• NLP + HCI may lead to killer apps.
page 23
Everywhere 無所不在?
• 人類是群居動物,語言是人類溝通的工具
• 大腦資訊的輸入輸出
• 每天使用語言,賴以為生
• 不會說話?聽不見?
無時無刻,無所不在!
page 24
Sample Text (中文)
• 下雨天留客天留我不留
– 下雨天留客 天留我不留
– 下雨天 留客天 留我不 留
– 下雨天 留客天 留我不留
• 紅鯉魚與綠鯉魚與驢與鯉魚與驢與紅鯉魚
與驢與綠鯉魚
page 25
Typical Challenges
• NLU: Natural Language Understanding
• Inference
– 玻璃杯碎了一地  玻璃杯不能用了
• Changing of languages, emerging of new
words, phrases and concepts.
– Domain: 跆拳道的品勢
– Social Media: 多多變套套
page 26
http://nlp.stanford.edu/~wcmac/papers/20140716-UNLU.pdf
page 27
Wrap Up -1
• What is NLP?
• What applications are related to NLP?
• NLP and NLU
• What are the current challenges?
• Next, let’s go ahead to NLP!
– about introducing the concept and trying the tools
online (if available)
page 28
Section 2
中英文文本處理相關工
具與資源介紹
page 29
First, make your
corpus/datasets/mater
ials ready!
11 December 2016
30
Natural Language Processing
• Basic Functions
– (Word Segmentation)
– Part of Speech Tagging
– (Stemming)
– Named Entity Extraction
– (Syntactic) Parsing
– Coreference resolution
– Text Categorization
page 31
Word Segmentation
• Some written languages have no explicit word
boundary markers, such as Chinese or
Japanese.
• If words are to be the basic units for text
processing, we need to know the boundaries.
• 下雨天留客天留我不留
• 私は自然言語処理を好む
• ‫الطبيعية‬ ‫اللغة‬ ‫معالجة‬ ‫أفضل‬ ‫أنا‬
page 32
Stemmer (English)
• The process of reducing inflected (or sometimes
derived) words to their word stem, base or root
form—generally a written word form*
*wikipedia
page 33
I love natural language processing.
I love natur languag process .
Stemming
TF‧IDF (1)
• Something used a lot in IR
• term frequency * inversed document frequency
• Calculate the weight of each term (usually
words) in a dataset
• An example of how to represent documents
page 34
Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
Antony 5.25 3.18 0 0 0 0.35
Brutus 1.21 6.1 0 1 0 0
Caesar 8.59 2.54 0 1.51 0.25 0
Calpurnia 0 1.54 0 0 0 0
Cleopatra 2.85 0 0 0 0 0
mercy 1.51 0 1.9 0.12 5.25 0.88
worser 1.37 0 0.11 4.15 0.25 1.95
TF‧IDF (2)
• 非常老但有效的公式
• 一個字的重要性指標由兩個因素決定:
– 在同一篇文章中,出現的次數越多越重要
– 出現的文章越少越重要
page 35
)df/(log)tflog1(w 10,, tdt Ndt

Bag of Words Model
• Often abbreviated as “BOW”
• Words are used as features
WITHOUT their order.
• 我給你一百萬 = 你給我一百萬
• Usually working with
N-gram features
我 給 你 一 百 萬 我給 給你 你一 一百
百萬 我給你 給你一 你一百 一百萬
page 36
TFIDF + BOW (uni-&bi-grams) 現今仍是某些
task的state of the art,或至少能得到很接近
state of the art的效能,是很強的baseline。
page 37
• So far we have word-level information.
• Next, we start to add more information on
words and further to larger segments.
page 38
Part of Speech (POS) Tagging
• I love natural language processing.
• (PRP I) (VBP love) (JJ natural) (NN language)
• (NN processing)(. .)
Verb, non-3rd person singular present
Personal pronoun
Adjective Noun, singular or mass
Tags may vary – using different tagging sets:
Penn Treebank Tagging Set
page 39
Parsing (English)
• Constituent Parse Tree
(ROOT
(S
(NP (PRP I))
(VP (VBP love)
(NP (JJ natural) (NN language))
(NP (NN processing)))
(. .)))
page 40
Parsing (English)
• POS
• Dependency Tree
• Dependency Parser
page 41
Semantic Role Labeling
To label the role each word plays in sentences
from the semantic aspect.
https://www.slideshare.net/marinasantini1/semantic-role-labeling
page 42
Parsing (Chinese)
• Stanford Parser (Simplified Chinese)
• 语言云(语言技术平台云LTP-Cloud)
(Simplified Chinese)
– 哈工大-讯飞语言云 (2014)
– 經由HTTP request 取得結果
• CKIP Parser
page 43
Parsing (Chinese)
• 我爱 自然 语言 处理
• 我爱/VV 自然/NN 语言/NN 处理/NN
• root(ROOT-0, 我爱-1)
• compound:nn(处理-4, 自然-2)
• compound:nn(处理-4, 语言-3)
• dobj(我爱-1, 处理-4)
• Error Propogation!
page 44
Parsing (Chinese)
• #1:1.[0] S(experiencer:NP(Head:Nhaa:
我)|Head:VL1:愛|reason:NP(property:Na:自然
|Head:Nac:語言)|goal:VP(Head:VC2:處理))#
page 45
Using Tools (1)
• Stanford Parser
• Stanford CoreNLP
(Demo)
• Berkeley Parser
• SRL
(Demo)
page 46
英文的工具好完整,那
中文呢?
page 47
Using Tools (2)
• Jieba (segmentation, python codes)
– HMM/Viterbi algorithm
• CKIP
– Chinese Segmentor/POS Tagger
– Parser
page 48
• For the traditional Chinese text environment…
NLP Tools Comparison
page 49
Stanford
CoreNLP
Jieba CKIP
Language support
Easy to use
Domain adaptation
Performance
Price
Using Tools (3)
• NLTK (python): tokenize, tag, NE extraction,
show parsing trees
– Porter stemmer
– n-grams
• tfidf not in NLTK, use scikit-learn.
(machine learning in python)
page 50
Semantic Resources
• Wordnet (English) online demo
• Freebase (English): API shutdown on Aug 31,
2016 => Google’s knowledge graph
• Hownet (Simplified Chinese)
• E-hownet (Traditional Chinese)
page 51
Word Embeddings (1)
Word embedding與過去使用的詞向量差異點:
可以做語意運算: king + woman – man = queen
page 52
Word Embeddings (2)
Pre-trained or train by yourself!
• w2v
• Glove
我不會deep learning怎麼辦?
You can find various of embeddings on the Web.
[Check here!]
page 53
我知道這些資訊跟處理方法了
然後呢?
• 可以做
– 資訊擷取 (information extraction): 甚麼學習方法
都不會的話,可以寫一些規則來抽取自己需要
的資訊!
– 機器學習 (machine learning): 如果會一點機器學
習,使用剛才介紹的文字處理方式,可以獲得
很多資訊當作特徵來學習模型,例如詞頻
(word frequency)、重要性(weight)、語言特徵
(POS)、句子結構(parsing tree)、語意(semantic
ontology, word embedding) 等等等
page 54
NLP Tasks
• Most of them can be transformed into
– Classification problem
– Clustering problem
– (Sequential) Labeling problem
page 55
現在已經可以進行基本
的自然語言處理任務了!
page 56
Wrap Up – Part II
• For the English and Chinese languages
• Pre-processing tools
• Syntactic analysis tools
• Semantic analysis tools
page 57
Part III
自然語言處理於網路與
社群媒體的挑戰
page 58
1. WWW/Social Media NLP
2. Sentiment Analysis Tool
page 59
Not only texts…
Created by Freepik
6011 December 2016
Money
Network
Sentiment User
Differences
• Web or social texts are in a written form of the
spoken language.
– New words
– Typos
– Urban language
– Cyber language
– Abbreviations
– A lot of (homo)phonic/semantic puns (諧音、雙關
語)
– Foreign languages (激安殿堂 牛逼)
page 61
If we just treat them as pure
texts…
• 八百屋的健太和大蔥女部分幫整個
劇情超加分
• 而且兩位演技都很好呀!最喜歡一
幕是
• 健太知道大蔥女的真面目後,在大
蔥女再來買蔥完要離開時
• 健太衝出去要追問的樣子,一副欲
言又止的臉
• 大蔥女也一副等待著健太說出來整
個很曖昧的畫面
• 八百(Neu) 屋(Na) 的(DE) 健(VH) 太(Dfa) 和
(P) 大蔥女(Na) 部分(Neqa) 幫(P) 整個(Neqa)
劇情(Na) 超(VJ) 加分(VB) 而且(Cbb) 兩(Neu)
位(Nf) 演技(Na) 都(D) 很(Dfa) 好(VH) 呀
(T) !(EXCLAMATIONCATEGORY)
• ---------------------------------------------------------------------
-------------------------------------------------------------
• 最(Dfa) 喜歡(VK) 一(Neu) 幕(Nf) 是(SHI)
健(VH) 太(Dfa) 知道(VK) 大蔥女(Na) 的(DE)
真面目(Na) 後(Ng) ,(COMMACATEGORY)
• ---------------------------------------------------------------------
-------------------------------------------------------------
• 在(P) 大蔥女(Na) 再(D) 來(D) 買蔥完(VC)
要(D) 離開(VC) 時(Ng) 健(VH) 太(Dfa) 衝出
去(VA) 要(D) 追問(VE) 的(DE) 樣子(Na) ,
(COMMACATEGORY)
• ---------------------------------------------------------------------
-------------------------------------------------------------
• 一(Neu) 副(Nf) 欲言又止(VH) 的(DE) 臉(Na)
大蔥女(Na) 也(D) 一(Neu) 副(Nf) 等待(VK)
著(Di) 健(VH) 太(Dfa) 說出來(VB) 整個(Neqa)
很(Dfa) 曖昧(VH) 的(DE) 畫面(Na)
• ---------------------------------------------------------------------
-------------------------------------------------------------
page 62
Stanford vs. CKIP
• 八百(Neu) 屋(Na)
的(DE) 健(VH)
太(Dfa) 和(P) 大
蔥女(Na) 部分
(Neqa) 幫(P) 整
個(Neqa) 劇情
(Na) 超(VJ) 加
分(VB)
• 八百/CD 屋/NN 的
/DEG 健太/NR 和
/CC 大葱/NR 女/JJ
部分/NN 帮/VV 整
个/DT 剧情/NN 超
加分/NN
11 December 2016
63
More Preprocessing Needed
• Need to filter out dirty texts and find the major
content.
– Texts for ads
– Texts for format
• Need to cut sentences first before sending
them into the parser.
6411 December 2016
Skills We Might Need
• Text Normalization
• Multimedia multimodal
• User and Text Networking
• Social Network
page 65
Skills We Might Need
• Text Normalization
• Multimedia multimodal
• User and Text Networking
• Social Network
page 66
Text Normalization
• Normalization is to change the text written in
web language into the one in the formal language
before further to process it.
• 私心喜翻的日式簡約風  私心喜歡的日式簡
約風
• 想一起去ㄉ水水們  想一起去的漂亮女生們
• 漂漂是今年才成為麻麻  漂漂是今年才成為
媽媽
page 67
2017十大鄉民流行用語
• #1 低能卡
• #2 垃圾不分藍綠
• #3 我難過
• #4 這我一定吉
• #5 發錢
• #6 8+9
• #7 銅鋰鋅
• #8 下去領500
• #9 海水退潮就知道
誰沒穿褲子
• #10 少時不讀書,
長大當記者
• 同場加映:廠廠
page 68
Processing Web Text: do we need normalization?
page 69
Or, A Parser for Web Text
• Tweet POS Tagger/Parser like: ARK
• Train with web texts to capture their characteristics.
ikr smh he asked fir yo last name
so he can add u on fb lololol
• Unfortunately, so far we don’t have any for the
Chinese language.
7011 December 2016
Skills We Might Need
• Text Normalization
• Multimedia multimodal
• User and Text Networking
• Social Network
page 71
隨便開一個網路文章
• http://linshibi.com/
page 72
Skills We Might Need
• Text Normalization
• Multimedia multimodal
• User and Text Networking
• Social Network
page 73
User and Text Network (1)
• We can observe this networking in all social
media in a forum-like style.
page 74
User and Text Network (2)
page 75
User and Text Network (2)
page 76
• We will explain the way to utilize the concept
of user and text network using the UTCNN
model in the following sentiment package.
page 77
Sentiment Analysis
page 78
Sentiment Analysis Is…
• Studying opinions, sentiments, subjectivities,
affects, emotions, views, etc. in text such as
news, blogs, reviews, comments, dialogs, or
other kind of documents.
• An important research question:
– Sentiment information is global and powerful.
– Sentiment information is valuable for companies,
customers and personal communication.
79
11 December 2016
Sentiment Representation
• Categorical
– Sentiment, non-sentiment
– Positive, neutral, negative
– Stars
– Emotions categories like Joy, Angry, Sadness…
• Dimensional
– Valence Arousal
11 December 201680
CSentiPackage
@NLPSA
11 December 2016
81
CSentiPackage
• Datasets
– Chinese Morphological Dataset Cmorph (former
version of ACiBiMA)*
– Chinese Opinion Treebank
• Resources
– NTUSD/ANTUSD
• Tools
– CopeOpi + Tag Mapping File
– UTCNN
*https://github.com/windx0303/ACBiMA
11 December 201682
Statistics
• NTUSD: Sentiment Dictionary (with 10,371
words): free for research, 400+ applications
• ANTUSD: Augmented NTUSD (with 27,221
words, now integrating with e-Hownet)
• Cmorph (with 8,000+ words) -> ACBiMA
(with 11,000+ words)
• Chinese Opinion Treebank: labels on Chinese
Treebank 5.1
11 December 201683
Materials:
From Words to Sentences
• NTUSD: words (binary sentiment)
• ANTUSD: words (annotation features)
• Chinese Morphological Dataset: words
(morphological structures)
• Chinese Opinion Treebank: phrases (sentence
structure)
• Chinese Opinion Treebank: sentences (binary
sentiment)
11 December 201684
Tools:
From Words to Sentences,
Documents, and Beyond
• CopeOpi Sentiment Scoring Tool: words,
sentences, documents, documents+ (text)
• UTCNN: posts and users (text and social
media)
11 December 201685
NTUSD
• Simplified Chinese and traditional Chinese
versions
• A positive word collection of 2,812 words
• A negative word collection of 8,276 words
• No degree, no estimated scores and other
information.
11 December 201686
ANTUSD
• 6 Fields
– CopeOpi Score
– Number of positive annotation
– Number of neutral annotation
– Number of negative annotation
– Number of non-sentiment annotation
– Number of not-a-word annotation
• Not-a-word: useful as they are collected from real
segmentated data
開心 0.434168 1 0 0 0 0
酣聲 0 0 0 1 3 0
憤怒 -0.80011 0 0 5 0 0
11 December 201687
ANTUSD
• Contains also short phrases like一昧要求, 一
路過關斬將,備受外界期待…
11 December 201688
ANTUSD and E-HOWNET
• An integration of two resources which may help us play with
sentiment and semantics.
• Related English resource: SentiWordnet
– Refer to Wordnet
– With PosScore and NegScore added
– ObjScore = 1-(PosScore+NegScore)
E-HowNet
.., A frame-based entity-relation model extended from HowNet
.., Define lexical senses (concepts) in a hierarchical manner
.., Now integrated with ANTUSD and covers 47.7% words in
ANTUSD
11 December 201689
ANTUSD in E-HOWNET
11 December 201690
11 December 201691
Chinese Morphological Structure
• Parallel type: 財富 (rich wealth)
• Substantive-Modifier type: 痛哭 (bitterly cry)
• Subjective-Predicate type: 山崩 (land slip; landslide)
• Verb-Object type: 避暑 (escape from summer)
• Verb-Complement type: 提高 (increase: raise up)
• Negation type: 無情 (no feelings)
• Confirmation type: 有心 (have heart)
• Others
11 December 201692
Chinese Opinion Treebank
• Based on Chinese Treebank 5.1.
• Including the opinion labels of each sentences.
• Including the word-pairs and their composing
type in opinionated sentences.
• To avoid copyright issue, you need to have
Chinese Treebank 5.1 by yourself in order to
use Chinese Opinion Treebank!
11 December 201693
Chinese Opinion TreebankS ID=230: 黄河“金三角”成为新的投资热点
.node file .tree file .trio file
Fields
Node ID, POS, node
content, node depth
Node ID: children
Trio ID, trio head, trio left
node, trio right node, trio
type
Content
0,,,0
1,IP-HLN,,1
2,NP-SBJ,,2
3,NP-PN,,3
4,NR,黄河,4
5,NP,,3
6,PU,“,4
7,NN,金三角,4
8,PU,”,4
9,VP,,2
10,VV,成为,3
11,NP-OBJ,,3
12,CP,,4
13,WHNP-1,,5
14,-NONE-,*OP*,6
15,CP,,5
16,IP,,6
17,NP-SBJ,,7
18,-NONE-,*T*-1,8
19,VP,,7
20,VA,新,8
21,DEC,的,6
22,NP,,4
23,NN,投资,5
24,NN,热点,5
0:1,
1:2,9,
2:3,5,
3:4,
4:
5:6,7,8,
6:
7:
8:
9:10,11,
10:
11:12,22,
12:13,15,
13:14,
14:
15:16,21,
16:17,19,
17:18,
18:
19:20,
20:
21:
22:23,24,
23:
24:
2,1,2,9,3
3,22,23,24,2
Opinion labels of three annotators
(filename, SID, opinion, polarity, opinion type)
chtb_020.raw,230,N,,
chtb_020.raw,230,Y,POS,STATUS
chtb_020.raw,230,Y,POS,STATUS
Opinion gold standard
chtb_020.raw,230,Y,POS,STATUS
11 December 201694
Notation (Parsing Tree)
• T: the parsing tree of a
sentence S
• O = {o1, o2, …}: in-ordered set
of tree nodes
• tri
=
: an opinion trio
• : a syntactic inter-
word relation
Rpt є {Substantive-Modifier,
Subjective-Predicate, Verb-
Object, Verb-Complement,
Other}
Tri(S)=
1, IP, 活动, VP, Subjective-Predicate
2, VP, 取得, NP-OBJ, Verb-Object
3,NP-OBJ, 圆 满 , 成 功 , Substantive-
Modifier
11 December 201695
Chinese Opinion Treebank
• Align the opinion labels of sentences to
Chinese Treebank 5.1 by sentence IDs.
• Align Opinion trios to Chinese Treebank 5.1
by node IDs.
• Can be used to do opinion cause analysis.
11 December 201696
CopeOpi
• A statistical sentiment analysis tool
• Can be used without any training
• Users can update character weights or add any
sentiment words
• It runs fast.
11 December 201697
The First Idea
• Chinese characters are mostly morphemes and they
bear sentiment, too.
• Simple example: some characters are preferred for
naming, but some are not.
• For example, 德(ethic) 胜(win) 高(high) good for
names; 笨(stupid) 悲(sorrow) 惨(terrible) are not
good choices for names.
• With some exceptions, but still quite reliable if the
sentiment of character is acquired statistically from a
large naming corpus (or just sentiment dictionaries.)
Exceptions like 徐悲鸿.
11 December 201698
[仇 (-1.0) + 視 (0.0)] / 2 = -1/2 = -0.5 (NEG)
[富(1.0) + 貴(0.936)] / 2 = 0.968 (POS)
好人、美麗、憤怒、弱小…





 m
j
cc
n
j
cc
m
j
cc
c
jiji
ji
i
fnfnfpfp
fnfn
N
11
1
//
/
)( iii ccc NPS 


p
j
cw j
S
p
S
1
1





 m
j
cc
n
j
cc
n
j
cc
c
jiji
ji
i
fnfnfpfp
fpfp
P
11
1
//
/
99
Bag of Unit
11 December 2016
Aggregation
• Word sentiment
– Summing up opinion scores of characters
• Sentence sentiment
– Summing up opinion scores of words
So is there any way we can give them weights?
11 December 2016100
• Linguistic Information:
– Morphological structures
• Intra-word structures
– Sentence syntactic structures
• Inter-word structures
101
Weighted by Structures
11 December 2016
Linguistic Morpho. Type Example
1. Parallel 財富、打罵
2. Substantive-Modifier 低級、痛哭
3. Subjective-Predicate 心疼、氣虛
4. Verb-Object 失控、免職
5. Verb-Complement 看清、擊潰
Opinion Morpho. Type Example
6. Negation 無法、不慎
7. Confirmation 有賴、有愧
8. Others 姪子、薄荷
102
Get types by SVM, CRF, handcraft…
Morphological Structure
11 December 2016
Example of Sentiment Trios in
Chinese Opinion Treebank
Linguistic Morpho. Type Example
Parallel (Skip) 美麗而聰慧
1. Substantive-Modifier 高大的樓房
2. Subjective-Predicate 學習認真
3. Verb-Object 恢復疲勞
4. Verb-Complement 收拾乾淨
Morpho. Type Opinion Example
n. Others 為…/以…
11 December 2016103
Compositional
Chinese Sentiment Analysis
• Example:氣虛
• Subjective-Predicate type
• 氣 0.5195
• 虛 -0.8178
• Score(氣虛) = -0.8178
11 December 2016104
• Example:看清、看壞
• Verb-Complement type
• 看: 0.1
• 清: 0.8032
• 壞: -0.9
• Score(看清) = 0.8072
• Score(看壞) = -0.9
Example of Using Sentiment Trios
• Score: 0.6736
11 December 2016105
)()()(else
)(1-)(else
)()(then)0)(and0)((if
then)0)(and0)((if
2121
121
12121
21
CSCSCCS
CSCCS
CSCCSCSCS
CSCS




Substantive-Modifier type
)()()(else
))(())(()()(then
)0)(and0)((if
2121
21121
21
CSCSCCS
CSSIGNCSSIGNCSCCS
CSCS



Verb-Object type
0.3018
0.6736
0.4109
0.6736
Preprocessing
• Tokenize (segmentation)
– Jieba
– CKIP
– Stanford parser
• Part-of-speech tagging
– CKIP
– Stanford parser
Tokenize is mandatory, we will release the
“optional” version in the future.
11 December 2016106
CopeOpi – example
• $ ./run_trad.sh
– Run the CopeOpi with the files in the list “file.lst”
• Check the results in out/0001.txt
11 December 2016107
test_trad.txt 0001
CopeOpi – example
• Result summary in ./out.csv
11 December 2016108
Deep Neural Network Example
Word
• Morphological structure
for a better
word representation.
• Same idea but
for *Chinese sentiment
analysis*
• Luong, Thang, Richard Socher, and Christopher D. Manning. "Better Word Representations with Recursive Neural Networks
for Morphology." CoNLL. 2013.
11 December 2016109
Deep Neural Network Example
Sentence
• Learned composition function (of semantics): Richard Socher (RNN, series
work from 2011)
11 December 2016110
Learning by Neural Network
• Word Sentiment
• Sentence Sentiment
• Document Sentiment
• Social Media Post Sentiment
11 December 2016111
Learning by Deep Neural Network
• Word Sentiment: CNN + ANTUSD
• Sentence Sentiment
• Document Sentiment
• Social Media Post Sentiment: Text + User
Context
– Not yet consider structures!
11 December 2016112
CSentiPackage: UTCNN
Learning by Deep Neural Network
• Word Sentiment: CNN + ANTUSD
• Sentence Sentiment
• Document Sentiment
• Social Media Post Sentiment: Text + User
Context
11 December 2016113
User Topic Comment Neural Network
(UTCNN)
• A deep learning model of stance classification
on social media text
11 December 2016114
Deep Learning Model
AuthorsLikers
Post content
Comment content
Commenters
Topics
UTCNN
• Stance tendency
– Author
– Liker
– Topic
– Commenter
• Semantic preference
– Author
– Liker
– Topic
– Commenter
11 December 2016115
We should reject the re-construction
of the Nuclear power plant.
Great! ( )
NO! ……
(post)
(comment)
If you don’t know anything about deep learning
(again) …
– I won’t talk too much about it. No worries.
– You can take the courses organized by 臺灣資料
科學協會
– Knowing that it’s a DNN Chinese sentiment model
for now is enough.
page 116
Social Media Dataset Released
in CSentiPackage
• Facebook fan groups (Chinese)
– Author/liker/comment/commenter
– Single topic (learn latent topics by LDA)
– Unbalance
– Chinese
• Create Debate (English)
– Author
– Four topics
– Balance
– English
11 December 2016117
Environment
• Software
– OS: Linux
– Programming language
• Java 6 or higher
• python 2.7
– Theano 0.8.2
– Keras 1.0.3
– sklearn
• Hardware
– Graphic cards (deep learning)
11 December 2016118
Demo Environment
• CPU
– Intel Xeon E5-2630 v3 ×2
• RAM
– 64 GB
• OS
– Ubuntu 14.04 LTS
• Graphic cards
– Nvidia Tesla K40 ×2
11 December 2016119
UTCNN - data
11 December 2016120
• 3 46 57 … 573 49 61 4 -1 <sssss>福 島 核電廠 的
熔 毀 核 燃料棒 到底 有沒有 掉到 地下水層 …..<sssss>詳
見 俄國 時報 電視 專訪 <sssss> 544 490 565 … 428
危機 ,如果 安全 你 家 借放 ,事實 是 沒有 人 知道 真相 這
些 都 只是 推論 就 看 誰 的 推論 有 根據 合理 奇怪 的 是
擁核 五 毛 只 根據 東京 電力 的 說法 而 東京 電力 是 最
有 利益 關係 最 有 企圖 掩藏 事實 的 事主 貼 此 文 是 提
供 大家 獨立 沒有 核電 利益 纏身 的 核工 專家 與 小出裕
章 的 推論 僅 供 參考
UTCNN - demo
11 December 2016121
http://doraemon.iis.sinica.edu.tw/wordforce/
UTCNN - demo
11 December 2016122
http://doraemon.iis.sinica.edu.tw/wordforce/
Something Important About
CSentiPackage
11 December 2016123
• CSentiPackage you obtained is only for your group to
use for the research purpose.
• It has been officially released so they can be
downloaded any time.
• Download or check what’s new @
http://academiasinicanlplab.github.io/
• Find the tutorial materials of CSentiPackage @
http://www.lunweiku.com/
Skills We Might Need
• Text Normalization
• Multimedia multimodal
• User and Text Networking
• Social Network
page 124
NLP and Social Network
• NLP sometimes serves as the pre-processing of
the social network research to deal with
unstructured data.
• NLP in social media is sometimes referred by
Social Media Analytics
• NLP models can help find information such as
events, sentiment, named entities for social
network analysis
• The network analysis algorithm can benefit NLP
research by bringing in heterogeneous features.
page 125
Challenges
• Integrating features is not easy
• Integrating knowledge is not easy, either
• Data are big. Performance and efficiency are
tradeoffs.
• Social media are always changing and
different over generations.
• Visualizing both texts and the network is
challenging.
12611 December 2016
Wrap Up – Part III
• More context, more to know
• More context, better for guessing
• Inner context, outer context, inter context
• Pay more attention to the relations
12711 December 2016
Part IV
自然語言處理發展趨勢
與業界應用
page 128
1. Industrial Needs and Apps
2. Future Trend
11 December 2016
129
Industrial Needs
• Techniques can make
money
• Techniques can provide
better services (then to
make money)
• Techniques can make
users engaged (then to
make money)
13011 December 2016
Applications
• Ads
• Recommendation
• QA
• Interface: Chatbot
page 131
Advertisement
The most direct way to make profit
page 132
Ads (1)
• Google AdSense
– AdSense 運作方式 網站擁有者可以藉由Google
AdSense,以自己的線上內容來營利。 AdSense
會依據您的網站內容及訪客,放送適合的文字
與多媒體廣告。 這些廣告由想要宣傳產品的廣
告客戶製作及付費,而廣告客戶支付的費用會
因廣告而異,所以您的賺取的金額也會有所不
同。
• 廣告市占率: Google + FB 占九成
• But there is very little you can do (with NLP).
page 133
其他網站廣告常見形式
• 內容網站:推薦廣告文
page 134
Recommendation 產品推薦
• Content-based
• Collaborative filtering
• User behavior
NLP techniques are needed mostly for content-
based (items in e-commerce websites).
page 135
• User behavior can be related to unstructured
data.
page 136
page 137
Mobile: Apps Recommendation (1)
page 138
Descriptions
Review
Users
Others
Images
Images
Mobile: Apps Recommendation (2)
• Grouping them with similarity (like
communication) or events (like travel).
page 139
Chatbot: Where is my Dr. Know?
A new interface connected to understanding and
text generation.
page 140
page 141
Two major purposes of chatbot
• Chit-chat
• Task-oriented
The most natural kind is mixed somehow.
page 142
Four major types of functions
• 助理 (MS cortana)
• 陪伴者 (MS 小冰)
• 客服 (京東JIMI)
• 問答 (IBM Watson)
page 143
Chatbot
• Retrieval based
– 原理: 大家都接甚麼話,就接(最像的)那一句
– 優點: 句子都是人說過的,回應句較少出現不合
文法的問題
• Generation based
– 原理: 目前大部分的generation based model都是
由深度學習模型來實作的,藉由學習上一句與
本句的編碼解碼關係,來產生最佳回答句。
– 優點: 可以產生新的,語料中沒看過的答句
page 144
Chatbot
• Slot filling:
– Sequential tagging
– templates
page 145
Chatbot
• Api.ai: template/rule-based
page 146
Chatbot Challenges
• It is difficult to cross domain.
• Needs very big data
• It is challenging to connect to the background
knowledge.
However, chatbot performs satisfactory as a
small, limited bot. Many Facebook stores utilize
this kind of chatbot to sell things and provide
services.
page 147
Future Trend
• Application oriented NLP
– (character-based, no more segmentation/parsing…)
• Semantic oriented NLP
• Language independent NLP
• Multi-modal NLP
• Multi-sourced/featured NLP
• Knowledge empowered NLP
page 148
Final Wrap Up
• You have known what is NLP
• You have checked major NLP tools
• You have heard the cool things NLP can do
• Start NLP today!
14911 December 2016
Thank You
Q&A
11 December 2016
150
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹

More Related Content

What's hot

子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得Yuya Unno
 
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...Numenta
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attentiontaeseon ryu
 
[DL輪読会]1次近似系MAMLとその理論的背景
[DL輪読会]1次近似系MAMLとその理論的背景[DL輪読会]1次近似系MAMLとその理論的背景
[DL輪読会]1次近似系MAMLとその理論的背景Deep Learning JP
 
工作圈上課講義
工作圈上課講義工作圈上課講義
工作圈上課講義5045033
 
機械学習の理論と実践
機械学習の理論と実践機械学習の理論と実践
機械学習の理論と実践Preferred Networks
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
KaggleのテクニックYasunori Ozaki
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリングmlm_kansai
 
Kaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKazuki Fujikawa
 
AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説AtCoder Inc.
 
Wavelet matrix implementation
Wavelet matrix implementationWavelet matrix implementation
Wavelet matrix implementationMITSUNARI Shigeo
 
競技プログラミングにおけるコードの書き方とその利便性
競技プログラミングにおけるコードの書き方とその利便性競技プログラミングにおけるコードの書き方とその利便性
競技プログラミングにおけるコードの書き方とその利便性Hibiki Yamashiro
 
数理最適化とPython
数理最適化とPython数理最適化とPython
数理最適化とPythonYosuke Onoue
 
yans2022_hackathon.pdf
yans2022_hackathon.pdfyans2022_hackathon.pdf
yans2022_hackathon.pdfKosuke Yamada
 
[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learningDeep Learning JP
 
Crfと素性テンプレート
Crfと素性テンプレートCrfと素性テンプレート
Crfと素性テンプレートKei Uchiumi
 
勉強か?趣味か?人生か?―プログラミングコンテストとは
勉強か?趣味か?人生か?―プログラミングコンテストとは勉強か?趣味か?人生か?―プログラミングコンテストとは
勉強か?趣味か?人生か?―プログラミングコンテストとはTakuya Akiba
 
構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性Shiga University, RIKEN
 

What's hot (20)

子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得
 
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
 
[DL輪読会]1次近似系MAMLとその理論的背景
[DL輪読会]1次近似系MAMLとその理論的背景[DL輪読会]1次近似系MAMLとその理論的背景
[DL輪読会]1次近似系MAMLとその理論的背景
 
工作圈上課講義
工作圈上課講義工作圈上課講義
工作圈上課講義
 
機械学習の理論と実践
機械学習の理論と実践機械学習の理論と実践
機械学習の理論と実践
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
Kaggleのテクニック
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
 
Kaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular Properties
 
AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説
 
Wavelet matrix implementation
Wavelet matrix implementationWavelet matrix implementation
Wavelet matrix implementation
 
競技プログラミングにおけるコードの書き方とその利便性
競技プログラミングにおけるコードの書き方とその利便性競技プログラミングにおけるコードの書き方とその利便性
競技プログラミングにおけるコードの書き方とその利便性
 
数理最適化とPython
数理最適化とPython数理最適化とPython
数理最適化とPython
 
yans2022_hackathon.pdf
yans2022_hackathon.pdfyans2022_hackathon.pdf
yans2022_hackathon.pdf
 
[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning
 
Chokudai search
Chokudai searchChokudai search
Chokudai search
 
Crfと素性テンプレート
Crfと素性テンプレートCrfと素性テンプレート
Crfと素性テンプレート
 
双対性
双対性双対性
双対性
 
勉強か?趣味か?人生か?―プログラミングコンテストとは
勉強か?趣味か?人生か?―プログラミングコンテストとは勉強か?趣味か?人生か?―プログラミングコンテストとは
勉強か?趣味か?人生か?―プログラミングコンテストとは
 
構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性
 

Viewers also liked

[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路台灣資料科學年會
 
[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走台灣資料科學年會
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業台灣資料科學年會
 
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)台灣資料科學年會
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用台灣資料科學年會
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班台灣資料科學年會
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies台灣資料科學年會
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務台灣資料科學年會
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R台灣資料科學年會
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123台灣資料科學年會
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人台灣資料科學年會
 
認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁Tren Huang
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學台灣資料科學年會
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探台灣資料科學年會
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOWMark Chang
 

Viewers also liked (20)

[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
 
[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走[系列活動] Python 程式語言起步走
[系列活動] Python 程式語言起步走
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業
 
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
 
[系列活動] 機器學習速遊
[系列活動] 機器學習速遊[系列活動] 機器學習速遊
[系列活動] 機器學習速遊
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人
 
認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁認知神經科學x人工智慧-黃從仁
認知神經科學x人工智慧-黃從仁
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOW
 

Similar to [系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹

Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Text Representations for Deep learning
Text Representations for Deep learningText Representations for Deep learning
Text Representations for Deep learningZachary S. Brown
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
Universal Dependencies
Universal DependenciesUniversal Dependencies
Universal DependenciesTeresa Lynn
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday PeopleRebecca Bilbro
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1Pier Luca Lanzi
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 

Similar to [系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹 (20)

1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
HLT
HLTHLT
HLT
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Text Representations for Deep learning
Text Representations for Deep learningText Representations for Deep learning
Text Representations for Deep learning
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
Universal Dependencies
Universal DependenciesUniversal Dependencies
Universal Dependencies
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
Bird05 nltk-intro
Bird05 nltk-introBird05 nltk-intro
Bird05 nltk-intro
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Intro
IntroIntro
Intro
 
Intro
IntroIntro
Intro
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 

More from 台灣資料科學年會

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用台灣資料科學年會
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告台灣資料科學年會
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇台灣資料科學年會
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 台灣資料科學年會
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵台灣資料科學年會
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用台灣資料科學年會
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告台灣資料科學年會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維台灣資料科學年會
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察台灣資料科學年會
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰台灣資料科學年會
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳台灣資料科學年會
 

More from 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 

Recently uploaded

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 

Recently uploaded (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 

[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹

  • 1. Lun-Wei Ku NLPSA, Academia Sinica 無所不在的自然語言處理— 基礎概念、技術與工具介紹
  • 2. Speaker Lecturer: Lun-Wei Ku Currently: Assistant Research Fellow, IIS, Academia Sinica Adjunct Assistant Professor, NCTU • Working on NLP and Sentiment Analysis • Running NLPSA Lab :http://www.lunweiku.com/ :http://academiasinicanlplab.github.io/ • Currently On-going Projects: – Graph Embedding, Emotion Enabled Dialog System, Cross-lingual Text Suggestion, Proactive Dialog Generation from Images and Texts 2
  • 3. Outline 9:30 - 10:30 什麼是自然語言處理 10:30 - 10:50 茶點時間 10:50 - 12:30 中英文文本處理相關工具與資源介紹 12:30 - 13:20 午餐 13:20 - 15:00 自然語言處理於網路與社群媒體的挑戰 15:00 - 15:20 茶點時間 15:20 - 17:00 自然語言處理發展趨勢與業界應用 3
  • 6. What is Natural Language Processing? • Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. Challenges in natural language processing frequently involve natural language understanding, natural language generation (frequently from formal, machine-readable logical forms), connecting language and machine perception, dialog systems, or some combination thereof. (Wikipedia) page 6
  • 9. Domain 範疇 (1) • Biomedical • Cognitive Modeling and Psycholinguistics • Dialogue and Interactive Systems • Discourse and Pragmatics • Generation and Summarization • Information Extraction, Retrieval, Question Answering, Document Analysis and NLP Applications • Machine Learning • Machine Translation page 9 • Multidisciplinary • Multilinguality • Phonology, Morphology and Word Segmentation • Resources and Evaluation • Semantics • Sentiment Analysis and Opinion Mining • Social Media • Speech • Tagging, Chunking, Syntax and Parsing • Vision, Robotics and Grounding
  • 10. Domain 範疇 (2) • 文本朗讀(Text to speech)/語音合成(Speech synthesis) • 語音識別(Speech recognition) • 中文自動分詞(Chinese word segmentation) • 詞性標註(Part-of-speech tagging) • 句法分析(Parsing) • 自然語言生成(Natural language generation) • 文本分類(Text categorization) • 信息檢索(Information retrieval) • 信息抽取(Information extraction) • 文字校對(Text-proofing) • 問答系統(Question answering) • 機器翻譯(Machine translation) • 自動摘要(Automatic summarization) • 文字蘊涵(Textual entailment) page 10
  • 11. Applications (1) • IBM Watson: Jeopardy https://www.youtube.com/watch?v=WFR3lOm_ xhE • Google Translate/Google小姐 page 11
  • 12. Applications (2) • Spam filtering <-> Ads pushing – Google AdSense and so many others • Spelling Correction, Grammar – Grammarly- free grammar checker: https://www.grammarly.com/ – duoLinguo https://www.duolingo.com/ – 批改網 https://www.pigai.org/ – … page 12
  • 13. Applications (3) • Paper Generator – Mathgen http://thatsmathematics.com/mathgen/ • Poem Generator 《秋蟲的聲音》 – 幸運將要投奔你的門上的時候 – 秋蟲的聲音也沒有 – 你的眼睛的誘惑 – 在天空中飛動 – 像人家把門關了幾天吧 – 我一個迷人的容貌 – 有時候不必再有一個太陽 – 把大地照成一顆星球 page 13
  • 14. Applications (4) • Problem Solver – Math solver: https://www.cymath.com/ Step by step, NLP + others (graph, formula, …) page 14
  • 15. Applications (5) • AI doctor – IBM Watson Health • Optimize performance • Engaged cunsumers • Enable effective care • Manage population health – Why is AI doctor related to NLP? • MedNLP: medical records, communication… page 15
  • 16. Applications (5) • Summarization – 最佳示範:谷阿莫 *blog.investis.com • Sentiment/Opinion/Review • Social Media/Network application – Full of texts! *techxb.com page 16
  • 17. Application: Multi-modal NLP • Captioning page 17
  • 18. Application: Multi-modal NLP • Story Telling page 18
  • 19. Other Close Disciplines • Artificial Intelligence (AI) • Information Retrieval (IR) • Machine Learning (ML) • Human Computer Interaction (HCI) page 19
  • 20. NLP and AI • NLP takes care of the input/output of unstructural information for AI applications. • AI applications are expected to be write/speak like people. • NLP is getting more and more important in AI. • However, NLP is challenging. page 20
  • 21. NLP and IR • NLP borrows some concepts from IR, especially weighting scheme of words. • For IR, efficiency is very important. Some time limited NLP tasks will also incorporate ideas of IR to save time, e.g., clustering/offline preprocessing. page 21
  • 22. NLP and ML • In the past, NLP techniques utilized a lot of linguistic knowledge in the form of rules or probability. • NLP uses a lot of ML/DL techniques nowadays. page 22
  • 23. NLP and HCI • (writing or speaking) Language is a way for computers to communicate with people. • Representing information and utilizing them in an appropriate way can mitigate the errors people may sense. • NLP + HCI may lead to killer apps. page 23
  • 24. Everywhere 無所不在? • 人類是群居動物,語言是人類溝通的工具 • 大腦資訊的輸入輸出 • 每天使用語言,賴以為生 • 不會說話?聽不見? 無時無刻,無所不在! page 24
  • 25. Sample Text (中文) • 下雨天留客天留我不留 – 下雨天留客 天留我不留 – 下雨天 留客天 留我不 留 – 下雨天 留客天 留我不留 • 紅鯉魚與綠鯉魚與驢與鯉魚與驢與紅鯉魚 與驢與綠鯉魚 page 25
  • 26. Typical Challenges • NLU: Natural Language Understanding • Inference – 玻璃杯碎了一地  玻璃杯不能用了 • Changing of languages, emerging of new words, phrases and concepts. – Domain: 跆拳道的品勢 – Social Media: 多多變套套 page 26
  • 28. Wrap Up -1 • What is NLP? • What applications are related to NLP? • NLP and NLU • What are the current challenges? • Next, let’s go ahead to NLP! – about introducing the concept and trying the tools online (if available) page 28
  • 30. First, make your corpus/datasets/mater ials ready! 11 December 2016 30
  • 31. Natural Language Processing • Basic Functions – (Word Segmentation) – Part of Speech Tagging – (Stemming) – Named Entity Extraction – (Syntactic) Parsing – Coreference resolution – Text Categorization page 31
  • 32. Word Segmentation • Some written languages have no explicit word boundary markers, such as Chinese or Japanese. • If words are to be the basic units for text processing, we need to know the boundaries. • 下雨天留客天留我不留 • 私は自然言語処理を好む • ‫الطبيعية‬ ‫اللغة‬ ‫معالجة‬ ‫أفضل‬ ‫أنا‬ page 32
  • 33. Stemmer (English) • The process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form* *wikipedia page 33 I love natural language processing. I love natur languag process . Stemming
  • 34. TF‧IDF (1) • Something used a lot in IR • term frequency * inversed document frequency • Calculate the weight of each term (usually words) in a dataset • An example of how to represent documents page 34 Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 5.25 3.18 0 0 0 0.35 Brutus 1.21 6.1 0 1 0 0 Caesar 8.59 2.54 0 1.51 0.25 0 Calpurnia 0 1.54 0 0 0 0 Cleopatra 2.85 0 0 0 0 0 mercy 1.51 0 1.9 0.12 5.25 0.88 worser 1.37 0 0.11 4.15 0.25 1.95
  • 35. TF‧IDF (2) • 非常老但有效的公式 • 一個字的重要性指標由兩個因素決定: – 在同一篇文章中,出現的次數越多越重要 – 出現的文章越少越重要 page 35 )df/(log)tflog1(w 10,, tdt Ndt 
  • 36. Bag of Words Model • Often abbreviated as “BOW” • Words are used as features WITHOUT their order. • 我給你一百萬 = 你給我一百萬 • Usually working with N-gram features 我 給 你 一 百 萬 我給 給你 你一 一百 百萬 我給你 給你一 你一百 一百萬 page 36
  • 37. TFIDF + BOW (uni-&bi-grams) 現今仍是某些 task的state of the art,或至少能得到很接近 state of the art的效能,是很強的baseline。 page 37
  • 38. • So far we have word-level information. • Next, we start to add more information on words and further to larger segments. page 38
  • 39. Part of Speech (POS) Tagging • I love natural language processing. • (PRP I) (VBP love) (JJ natural) (NN language) • (NN processing)(. .) Verb, non-3rd person singular present Personal pronoun Adjective Noun, singular or mass Tags may vary – using different tagging sets: Penn Treebank Tagging Set page 39
  • 40. Parsing (English) • Constituent Parse Tree (ROOT (S (NP (PRP I)) (VP (VBP love) (NP (JJ natural) (NN language)) (NP (NN processing))) (. .))) page 40
  • 41. Parsing (English) • POS • Dependency Tree • Dependency Parser page 41
  • 42. Semantic Role Labeling To label the role each word plays in sentences from the semantic aspect. https://www.slideshare.net/marinasantini1/semantic-role-labeling page 42
  • 43. Parsing (Chinese) • Stanford Parser (Simplified Chinese) • 语言云(语言技术平台云LTP-Cloud) (Simplified Chinese) – 哈工大-讯飞语言云 (2014) – 經由HTTP request 取得結果 • CKIP Parser page 43
  • 44. Parsing (Chinese) • 我爱 自然 语言 处理 • 我爱/VV 自然/NN 语言/NN 处理/NN • root(ROOT-0, 我爱-1) • compound:nn(处理-4, 自然-2) • compound:nn(处理-4, 语言-3) • dobj(我爱-1, 处理-4) • Error Propogation! page 44
  • 45. Parsing (Chinese) • #1:1.[0] S(experiencer:NP(Head:Nhaa: 我)|Head:VL1:愛|reason:NP(property:Na:自然 |Head:Nac:語言)|goal:VP(Head:VC2:處理))# page 45
  • 46. Using Tools (1) • Stanford Parser • Stanford CoreNLP (Demo) • Berkeley Parser • SRL (Demo) page 46
  • 48. Using Tools (2) • Jieba (segmentation, python codes) – HMM/Viterbi algorithm • CKIP – Chinese Segmentor/POS Tagger – Parser page 48
  • 49. • For the traditional Chinese text environment… NLP Tools Comparison page 49 Stanford CoreNLP Jieba CKIP Language support Easy to use Domain adaptation Performance Price
  • 50. Using Tools (3) • NLTK (python): tokenize, tag, NE extraction, show parsing trees – Porter stemmer – n-grams • tfidf not in NLTK, use scikit-learn. (machine learning in python) page 50
  • 51. Semantic Resources • Wordnet (English) online demo • Freebase (English): API shutdown on Aug 31, 2016 => Google’s knowledge graph • Hownet (Simplified Chinese) • E-hownet (Traditional Chinese) page 51
  • 52. Word Embeddings (1) Word embedding與過去使用的詞向量差異點: 可以做語意運算: king + woman – man = queen page 52
  • 53. Word Embeddings (2) Pre-trained or train by yourself! • w2v • Glove 我不會deep learning怎麼辦? You can find various of embeddings on the Web. [Check here!] page 53
  • 54. 我知道這些資訊跟處理方法了 然後呢? • 可以做 – 資訊擷取 (information extraction): 甚麼學習方法 都不會的話,可以寫一些規則來抽取自己需要 的資訊! – 機器學習 (machine learning): 如果會一點機器學 習,使用剛才介紹的文字處理方式,可以獲得 很多資訊當作特徵來學習模型,例如詞頻 (word frequency)、重要性(weight)、語言特徵 (POS)、句子結構(parsing tree)、語意(semantic ontology, word embedding) 等等等 page 54
  • 55. NLP Tasks • Most of them can be transformed into – Classification problem – Clustering problem – (Sequential) Labeling problem page 55
  • 57. Wrap Up – Part II • For the English and Chinese languages • Pre-processing tools • Syntactic analysis tools • Semantic analysis tools page 57
  • 59. 1. WWW/Social Media NLP 2. Sentiment Analysis Tool page 59
  • 60. Not only texts… Created by Freepik 6011 December 2016 Money Network Sentiment User
  • 61. Differences • Web or social texts are in a written form of the spoken language. – New words – Typos – Urban language – Cyber language – Abbreviations – A lot of (homo)phonic/semantic puns (諧音、雙關 語) – Foreign languages (激安殿堂 牛逼) page 61
  • 62. If we just treat them as pure texts… • 八百屋的健太和大蔥女部分幫整個 劇情超加分 • 而且兩位演技都很好呀!最喜歡一 幕是 • 健太知道大蔥女的真面目後,在大 蔥女再來買蔥完要離開時 • 健太衝出去要追問的樣子,一副欲 言又止的臉 • 大蔥女也一副等待著健太說出來整 個很曖昧的畫面 • 八百(Neu) 屋(Na) 的(DE) 健(VH) 太(Dfa) 和 (P) 大蔥女(Na) 部分(Neqa) 幫(P) 整個(Neqa) 劇情(Na) 超(VJ) 加分(VB) 而且(Cbb) 兩(Neu) 位(Nf) 演技(Na) 都(D) 很(Dfa) 好(VH) 呀 (T) !(EXCLAMATIONCATEGORY) • --------------------------------------------------------------------- ------------------------------------------------------------- • 最(Dfa) 喜歡(VK) 一(Neu) 幕(Nf) 是(SHI) 健(VH) 太(Dfa) 知道(VK) 大蔥女(Na) 的(DE) 真面目(Na) 後(Ng) ,(COMMACATEGORY) • --------------------------------------------------------------------- ------------------------------------------------------------- • 在(P) 大蔥女(Na) 再(D) 來(D) 買蔥完(VC) 要(D) 離開(VC) 時(Ng) 健(VH) 太(Dfa) 衝出 去(VA) 要(D) 追問(VE) 的(DE) 樣子(Na) , (COMMACATEGORY) • --------------------------------------------------------------------- ------------------------------------------------------------- • 一(Neu) 副(Nf) 欲言又止(VH) 的(DE) 臉(Na) 大蔥女(Na) 也(D) 一(Neu) 副(Nf) 等待(VK) 著(Di) 健(VH) 太(Dfa) 說出來(VB) 整個(Neqa) 很(Dfa) 曖昧(VH) 的(DE) 畫面(Na) • --------------------------------------------------------------------- ------------------------------------------------------------- page 62
  • 63. Stanford vs. CKIP • 八百(Neu) 屋(Na) 的(DE) 健(VH) 太(Dfa) 和(P) 大 蔥女(Na) 部分 (Neqa) 幫(P) 整 個(Neqa) 劇情 (Na) 超(VJ) 加 分(VB) • 八百/CD 屋/NN 的 /DEG 健太/NR 和 /CC 大葱/NR 女/JJ 部分/NN 帮/VV 整 个/DT 剧情/NN 超 加分/NN 11 December 2016 63
  • 64. More Preprocessing Needed • Need to filter out dirty texts and find the major content. – Texts for ads – Texts for format • Need to cut sentences first before sending them into the parser. 6411 December 2016
  • 65. Skills We Might Need • Text Normalization • Multimedia multimodal • User and Text Networking • Social Network page 65
  • 66. Skills We Might Need • Text Normalization • Multimedia multimodal • User and Text Networking • Social Network page 66
  • 67. Text Normalization • Normalization is to change the text written in web language into the one in the formal language before further to process it. • 私心喜翻的日式簡約風  私心喜歡的日式簡 約風 • 想一起去ㄉ水水們  想一起去的漂亮女生們 • 漂漂是今年才成為麻麻  漂漂是今年才成為 媽媽 page 67
  • 68. 2017十大鄉民流行用語 • #1 低能卡 • #2 垃圾不分藍綠 • #3 我難過 • #4 這我一定吉 • #5 發錢 • #6 8+9 • #7 銅鋰鋅 • #8 下去領500 • #9 海水退潮就知道 誰沒穿褲子 • #10 少時不讀書, 長大當記者 • 同場加映:廠廠 page 68
  • 69. Processing Web Text: do we need normalization? page 69
  • 70. Or, A Parser for Web Text • Tweet POS Tagger/Parser like: ARK • Train with web texts to capture their characteristics. ikr smh he asked fir yo last name so he can add u on fb lololol • Unfortunately, so far we don’t have any for the Chinese language. 7011 December 2016
  • 71. Skills We Might Need • Text Normalization • Multimedia multimodal • User and Text Networking • Social Network page 71
  • 73. Skills We Might Need • Text Normalization • Multimedia multimodal • User and Text Networking • Social Network page 73
  • 74. User and Text Network (1) • We can observe this networking in all social media in a forum-like style. page 74
  • 75. User and Text Network (2) page 75
  • 76. User and Text Network (2) page 76
  • 77. • We will explain the way to utilize the concept of user and text network using the UTCNN model in the following sentiment package. page 77
  • 79. Sentiment Analysis Is… • Studying opinions, sentiments, subjectivities, affects, emotions, views, etc. in text such as news, blogs, reviews, comments, dialogs, or other kind of documents. • An important research question: – Sentiment information is global and powerful. – Sentiment information is valuable for companies, customers and personal communication. 79 11 December 2016
  • 80. Sentiment Representation • Categorical – Sentiment, non-sentiment – Positive, neutral, negative – Stars – Emotions categories like Joy, Angry, Sadness… • Dimensional – Valence Arousal 11 December 201680
  • 82. CSentiPackage • Datasets – Chinese Morphological Dataset Cmorph (former version of ACiBiMA)* – Chinese Opinion Treebank • Resources – NTUSD/ANTUSD • Tools – CopeOpi + Tag Mapping File – UTCNN *https://github.com/windx0303/ACBiMA 11 December 201682
  • 83. Statistics • NTUSD: Sentiment Dictionary (with 10,371 words): free for research, 400+ applications • ANTUSD: Augmented NTUSD (with 27,221 words, now integrating with e-Hownet) • Cmorph (with 8,000+ words) -> ACBiMA (with 11,000+ words) • Chinese Opinion Treebank: labels on Chinese Treebank 5.1 11 December 201683
  • 84. Materials: From Words to Sentences • NTUSD: words (binary sentiment) • ANTUSD: words (annotation features) • Chinese Morphological Dataset: words (morphological structures) • Chinese Opinion Treebank: phrases (sentence structure) • Chinese Opinion Treebank: sentences (binary sentiment) 11 December 201684
  • 85. Tools: From Words to Sentences, Documents, and Beyond • CopeOpi Sentiment Scoring Tool: words, sentences, documents, documents+ (text) • UTCNN: posts and users (text and social media) 11 December 201685
  • 86. NTUSD • Simplified Chinese and traditional Chinese versions • A positive word collection of 2,812 words • A negative word collection of 8,276 words • No degree, no estimated scores and other information. 11 December 201686
  • 87. ANTUSD • 6 Fields – CopeOpi Score – Number of positive annotation – Number of neutral annotation – Number of negative annotation – Number of non-sentiment annotation – Number of not-a-word annotation • Not-a-word: useful as they are collected from real segmentated data 開心 0.434168 1 0 0 0 0 酣聲 0 0 0 1 3 0 憤怒 -0.80011 0 0 5 0 0 11 December 201687
  • 88. ANTUSD • Contains also short phrases like一昧要求, 一 路過關斬將,備受外界期待… 11 December 201688
  • 89. ANTUSD and E-HOWNET • An integration of two resources which may help us play with sentiment and semantics. • Related English resource: SentiWordnet – Refer to Wordnet – With PosScore and NegScore added – ObjScore = 1-(PosScore+NegScore) E-HowNet .., A frame-based entity-relation model extended from HowNet .., Define lexical senses (concepts) in a hierarchical manner .., Now integrated with ANTUSD and covers 47.7% words in ANTUSD 11 December 201689
  • 90. ANTUSD in E-HOWNET 11 December 201690
  • 92. Chinese Morphological Structure • Parallel type: 財富 (rich wealth) • Substantive-Modifier type: 痛哭 (bitterly cry) • Subjective-Predicate type: 山崩 (land slip; landslide) • Verb-Object type: 避暑 (escape from summer) • Verb-Complement type: 提高 (increase: raise up) • Negation type: 無情 (no feelings) • Confirmation type: 有心 (have heart) • Others 11 December 201692
  • 93. Chinese Opinion Treebank • Based on Chinese Treebank 5.1. • Including the opinion labels of each sentences. • Including the word-pairs and their composing type in opinionated sentences. • To avoid copyright issue, you need to have Chinese Treebank 5.1 by yourself in order to use Chinese Opinion Treebank! 11 December 201693
  • 94. Chinese Opinion TreebankS ID=230: 黄河“金三角”成为新的投资热点 .node file .tree file .trio file Fields Node ID, POS, node content, node depth Node ID: children Trio ID, trio head, trio left node, trio right node, trio type Content 0,,,0 1,IP-HLN,,1 2,NP-SBJ,,2 3,NP-PN,,3 4,NR,黄河,4 5,NP,,3 6,PU,“,4 7,NN,金三角,4 8,PU,”,4 9,VP,,2 10,VV,成为,3 11,NP-OBJ,,3 12,CP,,4 13,WHNP-1,,5 14,-NONE-,*OP*,6 15,CP,,5 16,IP,,6 17,NP-SBJ,,7 18,-NONE-,*T*-1,8 19,VP,,7 20,VA,新,8 21,DEC,的,6 22,NP,,4 23,NN,投资,5 24,NN,热点,5 0:1, 1:2,9, 2:3,5, 3:4, 4: 5:6,7,8, 6: 7: 8: 9:10,11, 10: 11:12,22, 12:13,15, 13:14, 14: 15:16,21, 16:17,19, 17:18, 18: 19:20, 20: 21: 22:23,24, 23: 24: 2,1,2,9,3 3,22,23,24,2 Opinion labels of three annotators (filename, SID, opinion, polarity, opinion type) chtb_020.raw,230,N,, chtb_020.raw,230,Y,POS,STATUS chtb_020.raw,230,Y,POS,STATUS Opinion gold standard chtb_020.raw,230,Y,POS,STATUS 11 December 201694
  • 95. Notation (Parsing Tree) • T: the parsing tree of a sentence S • O = {o1, o2, …}: in-ordered set of tree nodes • tri = : an opinion trio • : a syntactic inter- word relation Rpt є {Substantive-Modifier, Subjective-Predicate, Verb- Object, Verb-Complement, Other} Tri(S)= 1, IP, 活动, VP, Subjective-Predicate 2, VP, 取得, NP-OBJ, Verb-Object 3,NP-OBJ, 圆 满 , 成 功 , Substantive- Modifier 11 December 201695
  • 96. Chinese Opinion Treebank • Align the opinion labels of sentences to Chinese Treebank 5.1 by sentence IDs. • Align Opinion trios to Chinese Treebank 5.1 by node IDs. • Can be used to do opinion cause analysis. 11 December 201696
  • 97. CopeOpi • A statistical sentiment analysis tool • Can be used without any training • Users can update character weights or add any sentiment words • It runs fast. 11 December 201697
  • 98. The First Idea • Chinese characters are mostly morphemes and they bear sentiment, too. • Simple example: some characters are preferred for naming, but some are not. • For example, 德(ethic) 胜(win) 高(high) good for names; 笨(stupid) 悲(sorrow) 惨(terrible) are not good choices for names. • With some exceptions, but still quite reliable if the sentiment of character is acquired statistically from a large naming corpus (or just sentiment dictionaries.) Exceptions like 徐悲鸿. 11 December 201698
  • 99. [仇 (-1.0) + 視 (0.0)] / 2 = -1/2 = -0.5 (NEG) [富(1.0) + 貴(0.936)] / 2 = 0.968 (POS) 好人、美麗、憤怒、弱小…       m j cc n j cc m j cc c jiji ji i fnfnfpfp fnfn N 11 1 // / )( iii ccc NPS    p j cw j S p S 1 1       m j cc n j cc n j cc c jiji ji i fnfnfpfp fpfp P 11 1 // / 99 Bag of Unit 11 December 2016
  • 100. Aggregation • Word sentiment – Summing up opinion scores of characters • Sentence sentiment – Summing up opinion scores of words So is there any way we can give them weights? 11 December 2016100
  • 101. • Linguistic Information: – Morphological structures • Intra-word structures – Sentence syntactic structures • Inter-word structures 101 Weighted by Structures 11 December 2016
  • 102. Linguistic Morpho. Type Example 1. Parallel 財富、打罵 2. Substantive-Modifier 低級、痛哭 3. Subjective-Predicate 心疼、氣虛 4. Verb-Object 失控、免職 5. Verb-Complement 看清、擊潰 Opinion Morpho. Type Example 6. Negation 無法、不慎 7. Confirmation 有賴、有愧 8. Others 姪子、薄荷 102 Get types by SVM, CRF, handcraft… Morphological Structure 11 December 2016
  • 103. Example of Sentiment Trios in Chinese Opinion Treebank Linguistic Morpho. Type Example Parallel (Skip) 美麗而聰慧 1. Substantive-Modifier 高大的樓房 2. Subjective-Predicate 學習認真 3. Verb-Object 恢復疲勞 4. Verb-Complement 收拾乾淨 Morpho. Type Opinion Example n. Others 為…/以… 11 December 2016103
  • 104. Compositional Chinese Sentiment Analysis • Example:氣虛 • Subjective-Predicate type • 氣 0.5195 • 虛 -0.8178 • Score(氣虛) = -0.8178 11 December 2016104 • Example:看清、看壞 • Verb-Complement type • 看: 0.1 • 清: 0.8032 • 壞: -0.9 • Score(看清) = 0.8072 • Score(看壞) = -0.9
  • 105. Example of Using Sentiment Trios • Score: 0.6736 11 December 2016105 )()()(else )(1-)(else )()(then)0)(and0)((if then)0)(and0)((if 2121 121 12121 21 CSCSCCS CSCCS CSCCSCSCS CSCS     Substantive-Modifier type )()()(else ))(())(()()(then )0)(and0)((if 2121 21121 21 CSCSCCS CSSIGNCSSIGNCSCCS CSCS    Verb-Object type 0.3018 0.6736 0.4109 0.6736
  • 106. Preprocessing • Tokenize (segmentation) – Jieba – CKIP – Stanford parser • Part-of-speech tagging – CKIP – Stanford parser Tokenize is mandatory, we will release the “optional” version in the future. 11 December 2016106
  • 107. CopeOpi – example • $ ./run_trad.sh – Run the CopeOpi with the files in the list “file.lst” • Check the results in out/0001.txt 11 December 2016107 test_trad.txt 0001
  • 108. CopeOpi – example • Result summary in ./out.csv 11 December 2016108
  • 109. Deep Neural Network Example Word • Morphological structure for a better word representation. • Same idea but for *Chinese sentiment analysis* • Luong, Thang, Richard Socher, and Christopher D. Manning. "Better Word Representations with Recursive Neural Networks for Morphology." CoNLL. 2013. 11 December 2016109
  • 110. Deep Neural Network Example Sentence • Learned composition function (of semantics): Richard Socher (RNN, series work from 2011) 11 December 2016110
  • 111. Learning by Neural Network • Word Sentiment • Sentence Sentiment • Document Sentiment • Social Media Post Sentiment 11 December 2016111
  • 112. Learning by Deep Neural Network • Word Sentiment: CNN + ANTUSD • Sentence Sentiment • Document Sentiment • Social Media Post Sentiment: Text + User Context – Not yet consider structures! 11 December 2016112
  • 113. CSentiPackage: UTCNN Learning by Deep Neural Network • Word Sentiment: CNN + ANTUSD • Sentence Sentiment • Document Sentiment • Social Media Post Sentiment: Text + User Context 11 December 2016113
  • 114. User Topic Comment Neural Network (UTCNN) • A deep learning model of stance classification on social media text 11 December 2016114 Deep Learning Model AuthorsLikers Post content Comment content Commenters Topics
  • 115. UTCNN • Stance tendency – Author – Liker – Topic – Commenter • Semantic preference – Author – Liker – Topic – Commenter 11 December 2016115 We should reject the re-construction of the Nuclear power plant. Great! ( ) NO! …… (post) (comment)
  • 116. If you don’t know anything about deep learning (again) … – I won’t talk too much about it. No worries. – You can take the courses organized by 臺灣資料 科學協會 – Knowing that it’s a DNN Chinese sentiment model for now is enough. page 116
  • 117. Social Media Dataset Released in CSentiPackage • Facebook fan groups (Chinese) – Author/liker/comment/commenter – Single topic (learn latent topics by LDA) – Unbalance – Chinese • Create Debate (English) – Author – Four topics – Balance – English 11 December 2016117
  • 118. Environment • Software – OS: Linux – Programming language • Java 6 or higher • python 2.7 – Theano 0.8.2 – Keras 1.0.3 – sklearn • Hardware – Graphic cards (deep learning) 11 December 2016118
  • 119. Demo Environment • CPU – Intel Xeon E5-2630 v3 ×2 • RAM – 64 GB • OS – Ubuntu 14.04 LTS • Graphic cards – Nvidia Tesla K40 ×2 11 December 2016119
  • 120. UTCNN - data 11 December 2016120 • 3 46 57 … 573 49 61 4 -1 <sssss>福 島 核電廠 的 熔 毀 核 燃料棒 到底 有沒有 掉到 地下水層 …..<sssss>詳 見 俄國 時報 電視 專訪 <sssss> 544 490 565 … 428 危機 ,如果 安全 你 家 借放 ,事實 是 沒有 人 知道 真相 這 些 都 只是 推論 就 看 誰 的 推論 有 根據 合理 奇怪 的 是 擁核 五 毛 只 根據 東京 電力 的 說法 而 東京 電力 是 最 有 利益 關係 最 有 企圖 掩藏 事實 的 事主 貼 此 文 是 提 供 大家 獨立 沒有 核電 利益 纏身 的 核工 專家 與 小出裕 章 的 推論 僅 供 參考
  • 121. UTCNN - demo 11 December 2016121 http://doraemon.iis.sinica.edu.tw/wordforce/
  • 122. UTCNN - demo 11 December 2016122 http://doraemon.iis.sinica.edu.tw/wordforce/
  • 123. Something Important About CSentiPackage 11 December 2016123 • CSentiPackage you obtained is only for your group to use for the research purpose. • It has been officially released so they can be downloaded any time. • Download or check what’s new @ http://academiasinicanlplab.github.io/ • Find the tutorial materials of CSentiPackage @ http://www.lunweiku.com/
  • 124. Skills We Might Need • Text Normalization • Multimedia multimodal • User and Text Networking • Social Network page 124
  • 125. NLP and Social Network • NLP sometimes serves as the pre-processing of the social network research to deal with unstructured data. • NLP in social media is sometimes referred by Social Media Analytics • NLP models can help find information such as events, sentiment, named entities for social network analysis • The network analysis algorithm can benefit NLP research by bringing in heterogeneous features. page 125
  • 126. Challenges • Integrating features is not easy • Integrating knowledge is not easy, either • Data are big. Performance and efficiency are tradeoffs. • Social media are always changing and different over generations. • Visualizing both texts and the network is challenging. 12611 December 2016
  • 127. Wrap Up – Part III • More context, more to know • More context, better for guessing • Inner context, outer context, inter context • Pay more attention to the relations 12711 December 2016
  • 129. 1. Industrial Needs and Apps 2. Future Trend 11 December 2016 129
  • 130. Industrial Needs • Techniques can make money • Techniques can provide better services (then to make money) • Techniques can make users engaged (then to make money) 13011 December 2016
  • 131. Applications • Ads • Recommendation • QA • Interface: Chatbot page 131
  • 132. Advertisement The most direct way to make profit page 132
  • 133. Ads (1) • Google AdSense – AdSense 運作方式 網站擁有者可以藉由Google AdSense,以自己的線上內容來營利。 AdSense 會依據您的網站內容及訪客,放送適合的文字 與多媒體廣告。 這些廣告由想要宣傳產品的廣 告客戶製作及付費,而廣告客戶支付的費用會 因廣告而異,所以您的賺取的金額也會有所不 同。 • 廣告市占率: Google + FB 占九成 • But there is very little you can do (with NLP). page 133
  • 135. Recommendation 產品推薦 • Content-based • Collaborative filtering • User behavior NLP techniques are needed mostly for content- based (items in e-commerce websites). page 135
  • 136. • User behavior can be related to unstructured data. page 136
  • 138. Mobile: Apps Recommendation (1) page 138 Descriptions Review Users Others Images Images
  • 139. Mobile: Apps Recommendation (2) • Grouping them with similarity (like communication) or events (like travel). page 139
  • 140. Chatbot: Where is my Dr. Know? A new interface connected to understanding and text generation. page 140
  • 142. Two major purposes of chatbot • Chit-chat • Task-oriented The most natural kind is mixed somehow. page 142
  • 143. Four major types of functions • 助理 (MS cortana) • 陪伴者 (MS 小冰) • 客服 (京東JIMI) • 問答 (IBM Watson) page 143
  • 144. Chatbot • Retrieval based – 原理: 大家都接甚麼話,就接(最像的)那一句 – 優點: 句子都是人說過的,回應句較少出現不合 文法的問題 • Generation based – 原理: 目前大部分的generation based model都是 由深度學習模型來實作的,藉由學習上一句與 本句的編碼解碼關係,來產生最佳回答句。 – 優點: 可以產生新的,語料中沒看過的答句 page 144
  • 145. Chatbot • Slot filling: – Sequential tagging – templates page 145
  • 147. Chatbot Challenges • It is difficult to cross domain. • Needs very big data • It is challenging to connect to the background knowledge. However, chatbot performs satisfactory as a small, limited bot. Many Facebook stores utilize this kind of chatbot to sell things and provide services. page 147
  • 148. Future Trend • Application oriented NLP – (character-based, no more segmentation/parsing…) • Semantic oriented NLP • Language independent NLP • Multi-modal NLP • Multi-sourced/featured NLP • Knowledge empowered NLP page 148
  • 149. Final Wrap Up • You have known what is NLP • You have checked major NLP tools • You have heard the cool things NLP can do • Start NLP today! 14911 December 2016