SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Stretching the Life of Twitter 
Classifiers with Time-Stamped 
Semantic Graphs 
A. Elizabeth Cano (@pixarelli) 
amparo.cano@open.ac.uk 
Yulan He 
y.he9@aston.ac.uk 
Harith Alani 
h.alani@open.ac.uk 
1
Introduction Social Media Streams 
2
Introduction Representing Topics in 
Dynamic Environments 
Techniques for topic classification of Social Media 
are sensitive to the evolution of topics 
#Jan24 
dead 
Egypt 
protest 
security 
Egypt 
Pres Morsi 
Tehran 
Syrian 
uprising 
Boston 
bombing 
suspect 
Watertown 
Obama 
strategy 
ISIS 
3 dead in protest 
in Egypt. Security 
official vows to ‘deal 
firmly..#Jan24 
Egypt Pres Morsi 
uses his visit to 
Tehran to praise the 
Syrian uprising 
#Boston bombing 
suspect “pinned 
down” on boat in 
Watertown 
Why Obama needs to 
rethink his entire ISIS 
strategy… 
2011 
2012 
2013 
2014 
3
Introduction 
Challenges 
• Keeping updated model requires regular 
retuning. 
• Manual annotation expensive 
Questions 
• Which feature types provide a more stable 
representation of a topic? 
4
Introduction Previous work 
Using local features 
• Bag of Words (BoW)[Genc et al., 2011] 
• BoW + Bag of Entities (BoE) [Vitale et al., 2012] 
• BoW + BoE + Part of Speech (PoS) tagging [Munoz et al., 
2011][Varga et al., 2012] 
Exploiting the link structure of a Knowledge Source 
• Exploiting categories containing entities [Michelson et al., 
2010] 
• Relating tweets with Wikipedia resources[Milne et al., 2008] 
[Xu et al., 2011]. 
• Use of semantic features for topic classification [Cano et al., 
2013] [Varga et al.,2014]. 
5
Introduction Topic Evolution 
Twitter 
Corpus 
Topic 
. . . . 
. . . . 
t 
t+1 
. . . . 
Seman7c 
Lexical 
6
Introduction Characterising Topic Changes 
with DBpedia 
Some features remain unchanged, others 
provide information of past, current or future 
contexts (e.g. dbp:UnitedStatesPresidentialCandidates)! 
dbo:wikiPageWikiLink 
3.8 DBPEDIA dbp:Budget_Control_Act_of_2011 
3.7 DBPEDIA 
dbp:Al-Qaeda category:United_States_presidential_candidates,_2012 
dbp:Hawaii dbo:birthPlace 
dbp:Barack_Obama 
rdf:type 
yago:PresidentOfTheUnitedStates 
rdfs:subClassOf 
dbo:Person 
dbo:author 
dbp:Michelle_Obama 
dbo:spouse 
skos:subject 
dbp:The_Audacity_of_Hope 
.. 
dbp:Dreams_from_My_Father 
. . 
category:Community_organisers 
. . 
category:Columbia_University_Alumni 
3.6 DBPEDIA 
skos:subject 
dbo:leader 
dbp:United_States_National_Council 
dbp:National_Science_and_Techology 
dbo:wikiPageWikiLink 
7
Approach DBpedia Graph Snapshots 
Definition: 
Time-dependent Resource Meta Graph! 
Is a sequence of tuples G:=(R,P,C,Y, ft) where 
• R, P, C are finite sets whose elements are 
resources, properties and classes; 
• Y is a ternary relation 
Y ⊆ R× P ×C 
representing a hypergraph with ternary edges. 
• Y is a tripartite graph H (Y ) = V,D 
where the 
vertices are 
D = {{r, p, c} (r, p, c) ∈Y} 
• ft assigns a temporal marker to each ternary 
edge. 
8
Approach Semantic Representation of a 
Tweet 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dbp: http://dbpedia.org/resource/ 
9
Approach Semantic Representation of a 
Tweet 
Class Features (rdf:type) 
<dbo:OfficeHolder> rdf:type 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
rdf:type 
<yago:NobelPeacePrizeLaureates> 
rdf:type 
<dbo:Country> 
rdf:type 
<dbo:Broadcaster> 
dbo: http://dbpedia.org/ontology/ 
10
Approach Semantic Representation of a 
Tweet 
dbprop:title 
<dbp:Hosni_Mubarak> 
Property Features 
<dbp:Prime_Minister_of_Egypt> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dbprop:nationality 
American 
dbprop:headquarters 
<dbp:Altanta> 
dbprop:languages 
<dbp:Egyptian_Arabic> 
11 
skos: http://dbpedia.org/resource/Category:
Approach Semantic Representation of a 
Tweet 
Category Features (skos) 
<skos:PresidentsOfEgypt> dcterms:subject 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dcterms:subject 
dcterms:subject 
dcterms:subject 
<skos:English-language_television_stations> 
<skos:Presidents_of_the_United_States 
<skos:Arab_republics> 
12 
skos: http://dbpedia.org/resource/Category:
Approach Semantic Representation of a 
Tweet 
dbprop:title 
<dbp:Hosni_Mubarak> 
Resource Features 
<dbp:Prime_Minister_of_Egypt> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dbprop:commander 
dbprop:headquarters 
<dbp:Altanta> 
<dbp:Death_Of_Osama_Bin_Laden> 
dbprop:languages 
<dbp:Egyptian_Arabic> 
13 
skos: http://dbpedia.org/resource/Category:
Approach DBpedia Graph Snapshots 
I.e. The meta-graph of entity e is the aggregation of 
all resources, properties and classes related to this 
entity at time t. 
Properties and Resources 
<dbp:Barack_Obama> 
DBpedia 3.6 3.7 3.8 …. 
prop:spouse 
<MichelleObama> 
prop:birthPlace 
<Hawaii> 
prop:spouse 
<MichelleObama> 
prop:birthPlace 
<Hawaii> 
prop:commander 
prop:spouse 
<MichelleObama> 
prop:birthPlace 
<Hawaii> 
prop:wikiPageWikiLink 
<UnitedStatesPresidentialCandidates> 
prop:wikiPageWikiLink 
<dbp:Death_Of_Osama_Bin_Laden> 
14 
<Budget_Control_Act_of_2011>
Approach Semantic Feature Weighting 
Strategies 
Topic Relevance-based Weighting Strategy: 
Characterise the global relevance of a semantic feature to a 
given topic in DBpedia at a given point in time. 
DBpedia Graph Topic graph in DBpedia Graph 
? 
15
Approach Semantic Feature Weighting 
Strategies 
Topic Relevance-based Weighting Strategy: 
• Class-based Topic Relevance (ClsW) 
• Property-based Topic Relevance (PropW) 
• Category-based Topic Relevance (CatW) 
• Resource Relevance (ResW) 
16
Approach Semantic Feature Weighting 
Strategies 
Integrating weights into a Tweet representation 
DB_ t Wx ( f ) = DB_ t Nx ( f ) +1 
F + 
DB_ t Nx ( f ') f '∈F Σ 
# 
%% 
$ 
& 
(( 
' 
∗ WDB_ t ( f ) #$ 
1/2 
&' 
Semantic feature f in a document x: 
Frequency with Laplace smoothing 
Weight derived from DB_t graph 
17
Experiments Framework for Twitter Topic 
Classification with DBpedia 
• Do semantic features built from DBpedia Graphs 
18 
aid on a cross-epoch topic classification of 
Tweets? 
• Which feature type provides a more stable topic 
representation over time?
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
19
Experiments Datasets 
Tweets 
2010 2011 2013 
Nov-Dec Aug Sep 
1x106 1x106 1x106 
Assigns a topic label from a 
pool of over 10 categories 
Violence Related Topics 
Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) 
Perform Manual Annotation until 1K per year per Topic 
Negative set 1K per year for Topics other than the 3 
12K annotated tweets 
20
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
Concept Enrichment 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
21
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
Concept Enrichment 
Resource Backtrack Mapping 
2010 2011 2013 
Deriving Semantic Graph 
Snapshots 
22
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Concept Enrichment 
Resource Backtrack Mapping 
2010 2011 2013 
Deriving Semantic Graph 
Snapshots 
DBpedia Topic 
Relevance based 
Feature Weighting 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
23
Experiments Datasets 
LEX 
24 
W&C D&A L&C NEG 
2010 
2011 
2013 
2010 
2011 
2013 
2010 
2011 
2013 
2010 
2011 
2013 
SEMANTIC 
BoW Category Property Resource Class
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Concept Enrichment 
Resource Backtrack Mapping 
2010 2011 2013 
Deriving Semantic Graph 
Snapshots 
Topic 
Labelled 
Microposts 
2010 
2011 
2013 
Build Topic 
Classifier 
DBpedia Topic 
Relevance based 
Feature Weighting 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
25
Experiments Understanding the Stability of 
a Topic Representation 
Same epoch Scenario 
Lexical Semantic Combined Epoch t t+1 
train test 
26
Experiments Epoch Scenarios 
Same epoch Scenario (Trained on 2010- Tested on 2010) 
All 
the 
experiments 
reported 
in 
our 
paper 
were 
conducted 
using 
a 
10-­‐fold 
cross 
valida7on 
seMng 
Disaster_Acc Law_Crime War_Conflict 
F1 F1 F1 
BoW 0.831 0.765 0.844 
Category 0.697 0.650 0.744 
Property 0.680 0.639 0.720 
Resource 0.692 0.637 0.762 
Class 0.633 0.583 0.637 
27
Experiments Understanding the Stability of 
a Topic Representation 
Same epoch Scenario 
Lexical Semantic Combined Epoch t t+1 
train test 
Cross-epoch Scenario 
train test 
t t+1 28
Experiments Epoch Scenarios 
Cross-epoch Scenario (Trained on 2010- Tested on X) 
Disaster_Acc 
Cross- 
Epoch 
2010-2011 2010-2013 2011-2013 Average 
F1 F1 F1 
BoW 0.634 0.481 0.261 0.458 
Category 0.683 0.539 0.524 0.582 
Property 0.665 0.557 0.502 0.603 
Resource 0.774 0.544 0.445 0.587 
Class 0.691 0.665 0.669 0.675 
29
Experiments Epoch Scenarios 
Averaged Cross-epoch Scenarios 
Disaster_Acc Law_Crime War_Conflict Average 
F1 F1 F1 
BoW 0.458 0.620 0.531 0.536 
Category 0.582 0.537 0.453 0.55 
Property 0.574 0.504 0.506 0.528 
Resource 0.587 0.578 0.466 0.544 
Class 0.675 0.647 0.664 0.665 
30
Conclusions 
• Semantic Features are much slower to decay 
than lexical features. 
• Semantic representation improve performance in 
cross-time setting scenarios. 
• Class based features alone achieve on average a 
gain of 7% over lexical features on cross-epoch 
setting scenarios. 
31
Future Work 
• Concept-drift tracking for transfer learning using 
Linked Data sources. 
• Study cross-epoch transfer learning approaches 
using semantic features. 
32
Questions 
ampaeli@gmail.com 
@pixarelli 
33

Contenu connexe

En vedette

Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaAmparo Elizabeth Cano Basave
 
Pedir Servir Traer
Pedir  Servir  TraerPedir  Servir  Traer
Pedir Servir Traernrodriguez
 
Product CEO vs The World
Product CEO vs The WorldProduct CEO vs The World
Product CEO vs The WorldTariq Krim
 
Detecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaDetecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaAmparo Elizabeth Cano Basave
 
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Amparo Elizabeth Cano Basave
 
A Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesA Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesAmparo Elizabeth Cano Basave
 
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Amparo Elizabeth Cano Basave
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsAmparo Elizabeth Cano Basave
 
Units Of Measurement Spanish
Units Of  Measurement  SpanishUnits Of  Measurement  Spanish
Units Of Measurement Spanishnrodriguez
 
Introduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.AbbasIntroduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.AbbasBasra University, Iraq
 
Reflexive Verb Intro
Reflexive Verb IntroReflexive Verb Intro
Reflexive Verb Intronrodriguez
 
El Modo Imperativo Updated
El Modo Imperativo UpdatedEl Modo Imperativo Updated
El Modo Imperativo Updatednrodriguez
 

En vedette (16)

Locklear
LocklearLocklear
Locklear
 
Violence det ijcnlp13-slideshare
Violence det ijcnlp13-slideshareViolence det ijcnlp13-slideshare
Violence det ijcnlp13-slideshare
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
 
Pedir Servir Traer
Pedir  Servir  TraerPedir  Servir  Traer
Pedir Servir Traer
 
Product CEO vs The World
Product CEO vs The WorldProduct CEO vs The World
Product CEO vs The World
 
Does sizematter
Does sizematterDoes sizematter
Does sizematter
 
Detecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaDetecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social media
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
 
A Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesA Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political Debates
 
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity Streams
 
Units Of Measurement Spanish
Units Of  Measurement  SpanishUnits Of  Measurement  Spanish
Units Of Measurement Spanish
 
Introduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.AbbasIntroduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.Abbas
 
Reflexive Verb Intro
Reflexive Verb IntroReflexive Verb Intro
Reflexive Verb Intro
 
El Modo Imperativo Updated
El Modo Imperativo UpdatedEl Modo Imperativo Updated
El Modo Imperativo Updated
 

Similaire à Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationFrank van Harmelen
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...George Thomas
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Prashant Khare
 
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudMOVING Project
 
Keynote reusability measurement and social community analysis from mooc con...
Keynote   reusability measurement and social community analysis from mooc con...Keynote   reusability measurement and social community analysis from mooc con...
Keynote reusability measurement and social community analysis from mooc con...HannibalHsieh
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web PagesMichael Nelson
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionJennifer D'Souza
 
Using Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesUsing Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesVivian S. Zhang
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesMichael Nelson
 
Kid171 chap0 english version
Kid171 chap0 english versionKid171 chap0 english version
Kid171 chap0 english versionFrank S.C. Tseng
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pubStephen Buxton
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 

Similaire à Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs (20)

Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
All good things
All good thingsAll good things
All good things
 
Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
 
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
 
Keynote reusability measurement and social community analysis from mooc con...
Keynote   reusability measurement and social community analysis from mooc con...Keynote   reusability measurement and social community analysis from mooc con...
Keynote reusability measurement and social community analysis from mooc con...
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph Completion
 
Using Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesUsing Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York Times
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
Kid171 chap0 english version
Kid171 chap0 english versionKid171 chap0 english version
Kid171 chap0 english version
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 

Dernier

Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 

Dernier (17)

Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

  • 1. Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs A. Elizabeth Cano (@pixarelli) amparo.cano@open.ac.uk Yulan He y.he9@aston.ac.uk Harith Alani h.alani@open.ac.uk 1
  • 3. Introduction Representing Topics in Dynamic Environments Techniques for topic classification of Social Media are sensitive to the evolution of topics #Jan24 dead Egypt protest security Egypt Pres Morsi Tehran Syrian uprising Boston bombing suspect Watertown Obama strategy ISIS 3 dead in protest in Egypt. Security official vows to ‘deal firmly..#Jan24 Egypt Pres Morsi uses his visit to Tehran to praise the Syrian uprising #Boston bombing suspect “pinned down” on boat in Watertown Why Obama needs to rethink his entire ISIS strategy… 2011 2012 2013 2014 3
  • 4. Introduction Challenges • Keeping updated model requires regular retuning. • Manual annotation expensive Questions • Which feature types provide a more stable representation of a topic? 4
  • 5. Introduction Previous work Using local features • Bag of Words (BoW)[Genc et al., 2011] • BoW + Bag of Entities (BoE) [Vitale et al., 2012] • BoW + BoE + Part of Speech (PoS) tagging [Munoz et al., 2011][Varga et al., 2012] Exploiting the link structure of a Knowledge Source • Exploiting categories containing entities [Michelson et al., 2010] • Relating tweets with Wikipedia resources[Milne et al., 2008] [Xu et al., 2011]. • Use of semantic features for topic classification [Cano et al., 2013] [Varga et al.,2014]. 5
  • 6. Introduction Topic Evolution Twitter Corpus Topic . . . . . . . . t t+1 . . . . Seman7c Lexical 6
  • 7. Introduction Characterising Topic Changes with DBpedia Some features remain unchanged, others provide information of past, current or future contexts (e.g. dbp:UnitedStatesPresidentialCandidates)! dbo:wikiPageWikiLink 3.8 DBPEDIA dbp:Budget_Control_Act_of_2011 3.7 DBPEDIA dbp:Al-Qaeda category:United_States_presidential_candidates,_2012 dbp:Hawaii dbo:birthPlace dbp:Barack_Obama rdf:type yago:PresidentOfTheUnitedStates rdfs:subClassOf dbo:Person dbo:author dbp:Michelle_Obama dbo:spouse skos:subject dbp:The_Audacity_of_Hope .. dbp:Dreams_from_My_Father . . category:Community_organisers . . category:Columbia_University_Alumni 3.6 DBPEDIA skos:subject dbo:leader dbp:United_States_National_Council dbp:National_Science_and_Techology dbo:wikiPageWikiLink 7
  • 8. Approach DBpedia Graph Snapshots Definition: Time-dependent Resource Meta Graph! Is a sequence of tuples G:=(R,P,C,Y, ft) where • R, P, C are finite sets whose elements are resources, properties and classes; • Y is a ternary relation Y ⊆ R× P ×C representing a hypergraph with ternary edges. • Y is a tripartite graph H (Y ) = V,D where the vertices are D = {{r, p, c} (r, p, c) ∈Y} • ft assigns a temporal marker to each ternary edge. 8
  • 9. Approach Semantic Representation of a Tweet <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbp: http://dbpedia.org/resource/ 9
  • 10. Approach Semantic Representation of a Tweet Class Features (rdf:type) <dbo:OfficeHolder> rdf:type <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> rdf:type <yago:NobelPeacePrizeLaureates> rdf:type <dbo:Country> rdf:type <dbo:Broadcaster> dbo: http://dbpedia.org/ontology/ 10
  • 11. Approach Semantic Representation of a Tweet dbprop:title <dbp:Hosni_Mubarak> Property Features <dbp:Prime_Minister_of_Egypt> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbprop:nationality American dbprop:headquarters <dbp:Altanta> dbprop:languages <dbp:Egyptian_Arabic> 11 skos: http://dbpedia.org/resource/Category:
  • 12. Approach Semantic Representation of a Tweet Category Features (skos) <skos:PresidentsOfEgypt> dcterms:subject <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dcterms:subject dcterms:subject dcterms:subject <skos:English-language_television_stations> <skos:Presidents_of_the_United_States <skos:Arab_republics> 12 skos: http://dbpedia.org/resource/Category:
  • 13. Approach Semantic Representation of a Tweet dbprop:title <dbp:Hosni_Mubarak> Resource Features <dbp:Prime_Minister_of_Egypt> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbprop:commander dbprop:headquarters <dbp:Altanta> <dbp:Death_Of_Osama_Bin_Laden> dbprop:languages <dbp:Egyptian_Arabic> 13 skos: http://dbpedia.org/resource/Category:
  • 14. Approach DBpedia Graph Snapshots I.e. The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity at time t. Properties and Resources <dbp:Barack_Obama> DBpedia 3.6 3.7 3.8 …. prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:commander prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:wikiPageWikiLink <UnitedStatesPresidentialCandidates> prop:wikiPageWikiLink <dbp:Death_Of_Osama_Bin_Laden> 14 <Budget_Control_Act_of_2011>
  • 15. Approach Semantic Feature Weighting Strategies Topic Relevance-based Weighting Strategy: Characterise the global relevance of a semantic feature to a given topic in DBpedia at a given point in time. DBpedia Graph Topic graph in DBpedia Graph ? 15
  • 16. Approach Semantic Feature Weighting Strategies Topic Relevance-based Weighting Strategy: • Class-based Topic Relevance (ClsW) • Property-based Topic Relevance (PropW) • Category-based Topic Relevance (CatW) • Resource Relevance (ResW) 16
  • 17. Approach Semantic Feature Weighting Strategies Integrating weights into a Tweet representation DB_ t Wx ( f ) = DB_ t Nx ( f ) +1 F + DB_ t Nx ( f ') f '∈F Σ # %% $ & (( ' ∗ WDB_ t ( f ) #$ 1/2 &' Semantic feature f in a document x: Frequency with Laplace smoothing Weight derived from DB_t graph 17
  • 18. Experiments Framework for Twitter Topic Classification with DBpedia • Do semantic features built from DBpedia Graphs 18 aid on a cross-epoch topic classification of Tweets? • Which feature type provides a more stable topic representation over time?
  • 19. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 19
  • 20. Experiments Datasets Tweets 2010 2011 2013 Nov-Dec Aug Sep 1x106 1x106 1x106 Assigns a topic label from a pool of over 10 categories Violence Related Topics Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) Perform Manual Annotation until 1K per year per Topic Negative set 1K per year for Topics other than the 3 12K annotated tweets 20
  • 21. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 Concept Enrichment <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> 21
  • 22. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots 22
  • 23. Experiments Framework for Twitter Topic Classification with DBpedia Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots DBpedia Topic Relevance based Feature Weighting Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 23
  • 24. Experiments Datasets LEX 24 W&C D&A L&C NEG 2010 2011 2013 2010 2011 2013 2010 2011 2013 2010 2011 2013 SEMANTIC BoW Category Property Resource Class
  • 25. Experiments Framework for Twitter Topic Classification with DBpedia Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots Topic Labelled Microposts 2010 2011 2013 Build Topic Classifier DBpedia Topic Relevance based Feature Weighting Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 25
  • 26. Experiments Understanding the Stability of a Topic Representation Same epoch Scenario Lexical Semantic Combined Epoch t t+1 train test 26
  • 27. Experiments Epoch Scenarios Same epoch Scenario (Trained on 2010- Tested on 2010) All the experiments reported in our paper were conducted using a 10-­‐fold cross valida7on seMng Disaster_Acc Law_Crime War_Conflict F1 F1 F1 BoW 0.831 0.765 0.844 Category 0.697 0.650 0.744 Property 0.680 0.639 0.720 Resource 0.692 0.637 0.762 Class 0.633 0.583 0.637 27
  • 28. Experiments Understanding the Stability of a Topic Representation Same epoch Scenario Lexical Semantic Combined Epoch t t+1 train test Cross-epoch Scenario train test t t+1 28
  • 29. Experiments Epoch Scenarios Cross-epoch Scenario (Trained on 2010- Tested on X) Disaster_Acc Cross- Epoch 2010-2011 2010-2013 2011-2013 Average F1 F1 F1 BoW 0.634 0.481 0.261 0.458 Category 0.683 0.539 0.524 0.582 Property 0.665 0.557 0.502 0.603 Resource 0.774 0.544 0.445 0.587 Class 0.691 0.665 0.669 0.675 29
  • 30. Experiments Epoch Scenarios Averaged Cross-epoch Scenarios Disaster_Acc Law_Crime War_Conflict Average F1 F1 F1 BoW 0.458 0.620 0.531 0.536 Category 0.582 0.537 0.453 0.55 Property 0.574 0.504 0.506 0.528 Resource 0.587 0.578 0.466 0.544 Class 0.675 0.647 0.664 0.665 30
  • 31. Conclusions • Semantic Features are much slower to decay than lexical features. • Semantic representation improve performance in cross-time setting scenarios. • Class based features alone achieve on average a gain of 7% over lexical features on cross-epoch setting scenarios. 31
  • 32. Future Work • Concept-drift tracking for transfer learning using Linked Data sources. • Study cross-epoch transfer learning approaches using semantic features. 32