Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
1. Stretching the Life of Twitter
Classifiers with Time-Stamped
Semantic Graphs
A. Elizabeth Cano (@pixarelli)
amparo.cano@open.ac.uk
Yulan He
y.he9@aston.ac.uk
Harith Alani
h.alani@open.ac.uk
1
3. Introduction Representing Topics in
Dynamic Environments
Techniques for topic classification of Social Media
are sensitive to the evolution of topics
#Jan24
dead
Egypt
protest
security
Egypt
Pres Morsi
Tehran
Syrian
uprising
Boston
bombing
suspect
Watertown
Obama
strategy
ISIS
3 dead in protest
in Egypt. Security
official vows to ‘deal
firmly..#Jan24
Egypt Pres Morsi
uses his visit to
Tehran to praise the
Syrian uprising
#Boston bombing
suspect “pinned
down” on boat in
Watertown
Why Obama needs to
rethink his entire ISIS
strategy…
2011
2012
2013
2014
3
4. Introduction
Challenges
• Keeping updated model requires regular
retuning.
• Manual annotation expensive
Questions
• Which feature types provide a more stable
representation of a topic?
4
5. Introduction Previous work
Using local features
• Bag of Words (BoW)[Genc et al., 2011]
• BoW + Bag of Entities (BoE) [Vitale et al., 2012]
• BoW + BoE + Part of Speech (PoS) tagging [Munoz et al.,
2011][Varga et al., 2012]
Exploiting the link structure of a Knowledge Source
• Exploiting categories containing entities [Michelson et al.,
2010]
• Relating tweets with Wikipedia resources[Milne et al., 2008]
[Xu et al., 2011].
• Use of semantic features for topic classification [Cano et al.,
2013] [Varga et al.,2014].
5
7. Introduction Characterising Topic Changes
with DBpedia
Some features remain unchanged, others
provide information of past, current or future
contexts (e.g. dbp:UnitedStatesPresidentialCandidates)!
dbo:wikiPageWikiLink
3.8 DBPEDIA dbp:Budget_Control_Act_of_2011
3.7 DBPEDIA
dbp:Al-Qaeda category:United_States_presidential_candidates,_2012
dbp:Hawaii dbo:birthPlace
dbp:Barack_Obama
rdf:type
yago:PresidentOfTheUnitedStates
rdfs:subClassOf
dbo:Person
dbo:author
dbp:Michelle_Obama
dbo:spouse
skos:subject
dbp:The_Audacity_of_Hope
..
dbp:Dreams_from_My_Father
. .
category:Community_organisers
. .
category:Columbia_University_Alumni
3.6 DBPEDIA
skos:subject
dbo:leader
dbp:United_States_National_Council
dbp:National_Science_and_Techology
dbo:wikiPageWikiLink
7
8. Approach DBpedia Graph Snapshots
Definition:
Time-dependent Resource Meta Graph!
Is a sequence of tuples G:=(R,P,C,Y, ft) where
• R, P, C are finite sets whose elements are
resources, properties and classes;
• Y is a ternary relation
Y ⊆ R× P ×C
representing a hypergraph with ternary edges.
• Y is a tripartite graph H (Y ) = V,D
where the
vertices are
D = {{r, p, c} (r, p, c) ∈Y}
• ft assigns a temporal marker to each ternary
edge.
8
9. Approach Semantic Representation of a
Tweet
<dbp:Hosni_Mubarak>
<dbp:Barack_Obama> <dbp:Egypt>
<dbp:CNN>
dbp: http://dbpedia.org/resource/
9
10. Approach Semantic Representation of a
Tweet
Class Features (rdf:type)
<dbo:OfficeHolder> rdf:type
<dbp:Hosni_Mubarak>
<dbp:Barack_Obama> <dbp:Egypt>
<dbp:CNN>
rdf:type
<yago:NobelPeacePrizeLaureates>
rdf:type
<dbo:Country>
rdf:type
<dbo:Broadcaster>
dbo: http://dbpedia.org/ontology/
10
11. Approach Semantic Representation of a
Tweet
dbprop:title
<dbp:Hosni_Mubarak>
Property Features
<dbp:Prime_Minister_of_Egypt>
<dbp:Barack_Obama> <dbp:Egypt>
<dbp:CNN>
dbprop:nationality
American
dbprop:headquarters
<dbp:Altanta>
dbprop:languages
<dbp:Egyptian_Arabic>
11
skos: http://dbpedia.org/resource/Category:
12. Approach Semantic Representation of a
Tweet
Category Features (skos)
<skos:PresidentsOfEgypt> dcterms:subject
<dbp:Hosni_Mubarak>
<dbp:Barack_Obama> <dbp:Egypt>
<dbp:CNN>
dcterms:subject
dcterms:subject
dcterms:subject
<skos:English-language_television_stations>
<skos:Presidents_of_the_United_States
<skos:Arab_republics>
12
skos: http://dbpedia.org/resource/Category:
13. Approach Semantic Representation of a
Tweet
dbprop:title
<dbp:Hosni_Mubarak>
Resource Features
<dbp:Prime_Minister_of_Egypt>
<dbp:Barack_Obama> <dbp:Egypt>
<dbp:CNN>
dbprop:commander
dbprop:headquarters
<dbp:Altanta>
<dbp:Death_Of_Osama_Bin_Laden>
dbprop:languages
<dbp:Egyptian_Arabic>
13
skos: http://dbpedia.org/resource/Category:
14. Approach DBpedia Graph Snapshots
I.e. The meta-graph of entity e is the aggregation of
all resources, properties and classes related to this
entity at time t.
Properties and Resources
<dbp:Barack_Obama>
DBpedia 3.6 3.7 3.8 ….
prop:spouse
<MichelleObama>
prop:birthPlace
<Hawaii>
prop:spouse
<MichelleObama>
prop:birthPlace
<Hawaii>
prop:commander
prop:spouse
<MichelleObama>
prop:birthPlace
<Hawaii>
prop:wikiPageWikiLink
<UnitedStatesPresidentialCandidates>
prop:wikiPageWikiLink
<dbp:Death_Of_Osama_Bin_Laden>
14
<Budget_Control_Act_of_2011>
15. Approach Semantic Feature Weighting
Strategies
Topic Relevance-based Weighting Strategy:
Characterise the global relevance of a semantic feature to a
given topic in DBpedia at a given point in time.
DBpedia Graph Topic graph in DBpedia Graph
?
15
17. Approach Semantic Feature Weighting
Strategies
Integrating weights into a Tweet representation
DB_ t Wx ( f ) = DB_ t Nx ( f ) +1
F +
DB_ t Nx ( f ') f '∈F Σ
#
%%
$
&
((
'
∗ WDB_ t ( f ) #$
1/2
&'
Semantic feature f in a document x:
Frequency with Laplace smoothing
Weight derived from DB_t graph
17
18. Experiments Framework for Twitter Topic
Classification with DBpedia
• Do semantic features built from DBpedia Graphs
18
aid on a cross-epoch topic classification of
Tweets?
• Which feature type provides a more stable topic
representation over time?
19. Experiments Framework for Twitter Topic
Classification with DBpedia
Microposts
2010
Dumps
2011
3.6
3.7
3.8
2013 Resources
3.9
19
20. Experiments Datasets
Tweets
2010 2011 2013
Nov-Dec Aug Sep
1x106 1x106 1x106
Assigns a topic label from a
pool of over 10 categories
Violence Related Topics
Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C)
Perform Manual Annotation until 1K per year per Topic
Negative set 1K per year for Topics other than the 3
12K annotated tweets
20
26. Experiments Understanding the Stability of
a Topic Representation
Same epoch Scenario
Lexical Semantic Combined Epoch t t+1
train test
26
27. Experiments Epoch Scenarios
Same epoch Scenario (Trained on 2010- Tested on 2010)
All
the
experiments
reported
in
our
paper
were
conducted
using
a
10-‐fold
cross
valida7on
seMng
Disaster_Acc Law_Crime War_Conflict
F1 F1 F1
BoW 0.831 0.765 0.844
Category 0.697 0.650 0.744
Property 0.680 0.639 0.720
Resource 0.692 0.637 0.762
Class 0.633 0.583 0.637
27
28. Experiments Understanding the Stability of
a Topic Representation
Same epoch Scenario
Lexical Semantic Combined Epoch t t+1
train test
Cross-epoch Scenario
train test
t t+1 28
29. Experiments Epoch Scenarios
Cross-epoch Scenario (Trained on 2010- Tested on X)
Disaster_Acc
Cross-
Epoch
2010-2011 2010-2013 2011-2013 Average
F1 F1 F1
BoW 0.634 0.481 0.261 0.458
Category 0.683 0.539 0.524 0.582
Property 0.665 0.557 0.502 0.603
Resource 0.774 0.544 0.445 0.587
Class 0.691 0.665 0.669 0.675
29
30. Experiments Epoch Scenarios
Averaged Cross-epoch Scenarios
Disaster_Acc Law_Crime War_Conflict Average
F1 F1 F1
BoW 0.458 0.620 0.531 0.536
Category 0.582 0.537 0.453 0.55
Property 0.574 0.504 0.506 0.528
Resource 0.587 0.578 0.466 0.544
Class 0.675 0.647 0.664 0.665
30
31. Conclusions
• Semantic Features are much slower to decay
than lexical features.
• Semantic representation improve performance in
cross-time setting scenarios.
• Class based features alone achieve on average a
gain of 7% over lexical features on cross-epoch
setting scenarios.
31
32. Future Work
• Concept-drift tracking for transfer learning using
Linked Data sources.
• Study cross-epoch transfer learning approaches
using semantic features.
32