Slides sentiment 2013 10-3

Brief
Introduc.on
to

Sen.ment
Analysis

Joachim
De
Beule

4
May
2013

What
is
sen.ment?

Expression
of:

-‐
an
emo.on
(I
am
happy)

-‐
an
evalua.on
(Great
idea!)

-‐
a
stance
(I
support
the
bill)

What
is
sen.ment?

Expression
of:

-‐
an
emo.on
(I
am
happy)

-‐
an
evalua.on
(Great
idea!)

-‐
a
stance
(I
support
the
bill)

Involves
a
perspec.ve,
a
target
(named
en..es)
and

a
sen.ment
value.

Kermit
was
thrilled
about
the
idea!

Sen.ment
analysis
is
diﬃcult!!

Sen$ment
Precision
Recall

Nega.ve
71%
90%

Neutral
96%
87%

Posi.ve
77%
92%

Sen$ment
Precision
Recall

Nega.ve
88%
66%

Neutral
86%
97%

Posi.ve
91%
65%

Student
1:

Sen$ment
Precision
Recall

Nega.ve
79%
91%

Neutral
96%
90%

Posi.ve
80%
92%

Student
2:

Student
3:

71%
of
the
men.ons

labeled
“Nega.ve”
by

student
1
were
also

labeled
“Nega.ve”
by

student
2
or
3
(or
both)

29%
of
the
men.ons

labeled
“Nega.ve”
by

student
1
were
labeled

neutral
(or
posi.ve)
by

both
the
other
students.

Sen.ment
analysis
is
diﬃcult!!

Sen$ment
Precision
Recall

Nega.ve
71%
90%

Neutral
96%
87%

Posi.ve
77%
92%

Sen$ment
Precision
Recall

Nega.ve
88%
66%

Neutral
86%
97%

Posi.ve
91%
65%

Student
1:

Sen$ment
Precision
Recall

Nega.ve
79%
91%

Neutral
96%
90%

Posi.ve
80%
92%

Student
2:

Student
3:

66%
of
the
men.ons

labeled
“Nega.ve”
by

student
1
or
2
(or
both)

were
also
labeled

“Nega.ve”
by
student
3

34%
of
the
men.ons

labeled
“Nega.ve”
by

student
1
and
2
were
not

labeled
“Nega.ve”
by

student
3

Sen.ment
analysis
is
diﬃcult!!

Sen$ment
Precision
Recall

Nega.ve
71%
90%

Neutral
96%
87%

Posi.ve
77%
92%

Sen$ment
Precision
Recall

Nega.ve
88%
66%

Neutral
86%
97%

Posi.ve
91%
65%

Student
1:

Sen$ment
Precision
Recall

Nega.ve
79%
91%

Neutral
96%
90%

Posi.ve
80%
92%

Student
2:

Student
3:

Neutral
is
“easy”
because

70%
of
all
men.ons
are

neutral

Thus,
always
saying

“Neutral”
will
be
correct

70%
of
the
.me
and
lets

you
recall
100%
of
the

neutral
messages

Sen.ment
analysis
is
diﬃcult!!

#tvvv
neeeeee
:(
domien
is
out
;o
ik
blijf
vanje
houden

domien!

Eindelijk
verlost
van
@belgacom!
Surfen
gaat
een
pak

vlo?er
met
@telenet
:-‐)

Sen.ment
analysis
is
diﬃcult!!

#tvvv
neeeeee
:(
domien
is
out
;o
ik
blijf
vanje
houden

domien!

ບ"ມ$ຕ&ນໄມ)ຖ+ກອອກoຂ)າພະເຈ&າຍ5ງຮ5ກທ9ານເປ5ນຕ&ນໄມ)!

Eindelijk
verlost
van
@belgacom!
Surfen
gaat
een
pak

vlo?er
met
@telenet
:-‐)

ສ<ດທ)າຍຈາກຕ&ນໄມ)ເກມບ>ນແມ9ນ@າຍຂAນໄວທCມ$ປ9າໄມ)

Automa.c
Sen.ment
Analysis

Basic
strategy

Human

annota.on

Features

(unigrams)

Label/

Ac.on/

predic.on

Men.on

Tokeniza.on,

POS
taging,
…

Learning

Classiﬁer
Model:

Feature-‐weights

per
class

(“count
table”)

(1)
Training
phase

Features

(unigrams)
Men.on

Tokeniza.on,

POS
taging,
…

classiﬁca.on

Classiﬁer
Model:

Feature-‐weights

per
class

(“count
table”)

(2)
Opera.onal
phase

Label/

Ac.on/

predic.on

Automa.c
Sen.ment
Analysis

Basic
strategy

Automa.c
Sen.ment
Analysis

Training
Set:

neeeeee
:(
domien
is
out

=
NegaDve

ik
blijf
vanje
houden
domien!

=
PosiDve

eindelijk
verlost
van
@belgacom!

=
NegaDve

surfen
gaat
een
pak
vlo?er
met
@telenet
:-‐)
=
PosiDve

…

=

…

“Bag
of
Words”

“neeeeee
:(
domien
is
out”

=
PosiDve

{“domien”,
“is”,
“neeeeee”,
“out”,
“:(“}

=
PosiDve

unigram
#Nega$ve
#Neutral
#Posi$ve

…
…
…
…

“Ik”
3132
6245
3700

…
…
…
…

“:(“
365
122
58

…
…
…
…

“Domien”
22
13
14

“neeeeee”
4
1
0

…
…
…
…

Train
set
àTable
of
unigram
counts:

⇒ P[Nega.ve|
“ik”]
=
3132
/
(3132+6245+3700)
=
24%

⇒ P[Nega.ve|
“ik
ben
blij”]
=
?

Bayes
rule
of
condi.onal
probabili.es:

P[Nega.ve]
x
P[“ik
ben
blij”
|
Nega.ve]

P[
Nega.ve|
“ik
ben
blij”]

=

P[“ik
ben
blij”]

P[“ik
ben
blij”
|
Neg.]
=

P[“ik”
|
nega.ve]

(unigram)

x
P[“ben”
|
Neg.,
“ik”]

(bigram)

x
P[“blij”
|
Neg.,
“ik
ben”
]

(trigram)

Evidence
(same
for
all
senDments)

Prior
(over
all
menDons)

likelihood

Chain
rule:

Improvements
over
Naïve
Bayes

-‐  Beoer
features:

-‐  Bigrams,
trigrams,

-‐  Parts
of
speech

-‐  Tf/idf
weigh.ng

-‐  Gramma.cal
dependencies
(e.g.
nega.on
marking)

-‐  Named
en..es

-‐  Alterna.ve
strategies
to
calculate
feature
weights
from
counts

-‐  Transformed
Normalized
Weighted
Naïve
Bayes

-‐  Mutual
Informa.on

-‐  Maximum
entropy

-‐  Other
approaches

-‐  Sen.ment
lexicons
(cf.
current
classiﬁer)

Evalua.on

-‐  In
terms
of
Precision,
Recall,
F1,
Accuracy,
…

-‐  Very
good
on
“simple”
tasks
(comparable
to
humans)

-‐  e.g.
spam
detec.on

-‐  In
general,
tasks
for
which
grammar
and
context
are
not

important
(nega.on,
source/target/perspec.ve
roles,
…)

-‐  But
rather
bad
on
“diﬃcult”
tasks,
including
sen.ment

analysis
(worse
than
humans)

Sen$ment
Precision
Recall

Nega.ve
71%
90%

Neutral
96%
87%

Posi.ve
77%
92%

Sen$ment
Precision
Recall

Nega.ve
42%
43%

Neutral
83%
60%

Posi.ve
38%
76%

Student
1:

Sen$ment
Precision
Recall

Nega.ve
79%
72%

Neutral
76%
76%

Posi.ve
77%
73%

Maxent
2-‐grams

Current
classiﬁer:

(Results
maxent/current
for
balanced
english
student
dataset)

Many
unresolved
issues…

-‐  Other
languages
(Unsupervised
learning/bootstrapping)

-‐  Source/Target
resolu.on

-‐  Classiﬁers
trained
on
one
dataset/topic
does
not
perform
well

on
other
datasets/topics

-‐  …

…and
opportuni.es

Many
informa.on
extrac.on
problems
can
be
cast
as

classiﬁca.on
problems

-‐  Assigning
tags
to
men.ons

-‐  Predic.ng
the
number
of
likes/retweets/…
of
men.ons

-‐  Deciding
whom
to
send/assign
a
message

-‐  …

-‐  In
general,
any
problem
where
things
must
be
“labeled”,

“decided”
or
“predicted”,
with
a
limited
number
of

alterna.ves,
and
for
which
training
data
is
available
(can
be

user
feedback!)

-‐  And
our
users
generate
massive
amounts
of
data!!

à
don’t
hesitate
to
discuss
ideas
with
me!
ß

Part
2:
Clojure

-‐  Dynamic
programming
language
targe.ng
the
JVM
(and

javascript)

-‐  Combining
interac.ve
development
of
scrip.ng
language
with

eﬃcient
and
robust
infrastructure
for
mul.threaded

programming

-‐ 

-‐  Lisp
dialect:

-‐  (almost)
no
syntax

(+
1
2)
=>
3

(list
‘+
1
2)
=>
(+
1
2)

-‐  Code
as
data

(eval
(list
‘+
1
2))
=>
3

Part
2:
Clojure

-‐  Project
management
through
“leiningen”

-‐  bash$
lein
new
test-‐project

-‐  Add
dependencies
to
project.clj,
add
code
to
src/test-‐project

-‐  bash$
lein
uberjar

=>
testproject.jar

-‐  Java
–jar
test-‐project.jar

-‐  Online
demo…

Slides sentiment 2013 10-3

Recommandé

Recommandé

Contenu connexe

Similaire à Slides sentiment 2013 10-3

Similaire à Slides sentiment 2013 10-3 (20)

Slides sentiment 2013 10-3