SlideShare a Scribd company logo
1 of 47
Download to read offline
A. Elizabeth Cano, Andrea VargaŸ, Matthew Rowew, Fabio CiravegnaŸ, and
Yulan He°
Knowledge Media Institute, The Open University, Milton Keynes
Ÿ University of Sheffield, Sheffield
w Lancaster University, Lancaster
° Aston University, Birmingham
UK. 2013
Harnessing Linked Knowledge Sources for
Topic Classification in Social Media
INTRODUCTION
Social Media Streams - Risk in violent and criminal activities
INTRODUCTION
Research Questions:
o  Can semantic features help in topic classification (TC)?
o  Which knowledge source (KS) data and KS taxonomies
provide useful information for improving the TC of tweets?
OUTLINE
• Introduction
- Topic Classification (TC) of Microposts
- Related Work
- State of the art limitations
• Proposed Approach
• Experiments
• Findings
• Conclusions
INTRODUCTION
u  Difficulties of Topic Classification of microposts
o  Restricted number of characters
o  Irregular and ill-formed words
•  Mixing upper and lowercase letter
§  Makes it difficult to detect proper nouns, and other part of
speech tags.
•  Wide variety of language
§  E.g., “see u soon”
o  Event-dependent emerging jargon
• Volatile jargon relevant to particular events
§  E.g., “Jan.25” (used during the Egyptian revolution
o  High Topical Diversity
o  Sparse data
INTRODUCTION
Social Knowledge Sources (KS)
DBpedia* Yago2 Freebase
Resources 2.35 million 447million 3.6 million
Classes 359 562,312 1,450
Properties 1,820 253,213,842 7,000
*Using dbpedia ontology
o  Structured Semantic Web Representation of data
•  Maintained by thousand of editors
§  E.g DBpedia, derived from Wikipedia
§  Freebase
•  Evolves and adapts as knowledge changes [Syed et al,
2008]
o  Cover a broad range of topics
o  Characterise topics with a large number of resources
INTRODUCTION
Local and External Metadata of a Tweet
INTRODUCTION
Local and External Metadata of a Tweet
NER:CountryNER:Person
NER:Person
INTRODUCTION
Local and External Metadata of a Tweet
NER:CountryNER:Person
NER:Person
<http://dbpedia.org/resource/Barack_Obama
<http://dbpedia.org/resource/Egypt
<http://dbpedia.org/resource/Hosni_Mubarak
PROPOSED APPROACH
o  State of the art limitations
§  Use of single knowledge sources
§  Entities’ metadata is constrained by the used NER service
(e.g OpenCalais, Alchemy).
o  Our approach
§  Exploits multiple knowledge sources.
§  Enhances the entity metadata by deriving semantic graphs.
§  Leverages the graph structures surrounding entities present
in a KS for the TC task.
Exploiting Knowledge Sources for the Topic Classification of
Microposts
OUTLINE
• Introduction
• Proposed Approach
• Semantic Meta-graphs
• Weighting Schemas
• Enhancing TC with Semantic Features
• Experiments
• Findings
• Conclusions
PROPOSED APPROACH
Rationale…
1
2
PROPOSED APPROACH
Rationale…
1
2
Could be more indicative
of War and Conflict
PROPOSED APPROACH
Rationale…
2
Not necessarily a good
indicator of War and
Conflict
PROPOSED APPROACH
Rationale…
1
2
Can the graph structure of existing Knowledge sources provide
an abstraction of the use of these entity types for representing a
topic ?
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
1 Datasets Collection
SPARQL query for all resources from a
given Topic (e.g. War )
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
3 Semantic Features Derivation
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
4
Build a Topic Classifier based on Features
Derived from Crossed-Sources
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
4
Build a Topic Classifier based on Features
Derived from Crossed-Sources
PROPOSED APPROACH
Deriving Semantic Meta-Graphs
<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates>
<dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
PROPOSED APPROACH
Deriving Semantic Meta-Graphs
<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates>
<dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
PROPOSED APPROACH
Definition 1- Resource Meta-graph
Is a sequence of tuples G:=(R,P,C,Y) where
•  R, P, C are finite sets whose elements are resources,
properties and classes;
•  Y is a ternary relation representing a
hypergraph with ternary edges.
•  Y is a tripartite graph where the vertices
are
Y ! R " P "C
H Y( ) = V, D
D = r, p,c{ } r, p,c( ) ! Y{ }
PROPOSED APPROACH
Resource Meta-graph
The meta-graph of entity e is the aggregation of all resources,
properties and classes related to this entity.
Obama
birthPlace
author
spouse
Projecting on Properties Projecting on Classes
LivingPeople
PresidentOfTheUnitedStates
Obama
Person
Author
PROPOSED APPROACH
Resource Meta-graph
The meta-graph of entity e is the aggregation of all resources,
properties and classes related to this entity.
Obama
birthPlace
author
spouse
Projecting on Properties Projecting on Classes
LivingPeople
PresidentOfTheUnitedStates
Obama
Person
Author
How can we weight these graphs to reveal semantic
features characterise Obama in the context of
Violence?
?
?
?
?
?? ?
PROPOSED APPROACH
Weighting Semantic Features
Specificity
Measures the relative importance of a property to
a given class in a KS graph GKS:
p ! G e( )
c ! G e( )
specificityKS p,c( ) = pN R(c)( )
N(R(c))
PROPOSED APPROACH
Weighting Semantic Features
Generality
Captures the specialisation of a property p to a given class c,
by computing the property’s frequency among other
semantically related classes R’(c).
Where N(R’(c)) is the number of resources whose type is
either c or a specialisation of c’s parent classes.
generalityKS p,c( ) =
N R'(c)( )
pN (R'(c))
PROPOSED APPROACH
Weighting Semantic Features
SG p,c( ) = specificityKS p,c( )! generalityKS p,c( )
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation (A1)
Class Features
Property Features
Class+ Property Features
A1!CF' = F + CF
A1!PF' = F + pF
A1!C+PF' = F + cF + pF
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation (A1)
Class Features
Property Features
Class+ Property Features
A1!CF' = F + CF
A1!PF' = F + pF
A1!C+PF' = F + cF + pF
F
president, obama, televised, statement, hosni, mubarak, resignation,
cnn, says, egypt
FA1+ P dbpedia:birth, dbpedia:state, …., dbpedia-owl:PopulatedPlace/
populationDensity….
FA1+ C
PopulatedPlace, Office_holder, PresidentOfTheUnitedStates,
Politician…
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation with Generalisation (A2)
This augmentation exploits the subsumption relation among
classes within the DBpedia or Freebase ontologies. In this
cases we consider the set of parent classes of c.
Parent(c) Features
Parent(c) + Property Features
A2!CF' = F + parent(c)F
A2!C+PF' = F + pF + parent(c)F
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation with Generalisation (A2)
This augmentation exploits the subsumption relation among
classes within the DBpedia or Freebase ontologies. In this
cases we consider the set of parent classes of c.
Parent(c) Features
Parent(c)+Property Features
A2!CF' = F + parent(c)F
A2!C+PF' = F + pF + parent(c)F
F
president, obama, televised, statement, hosni, mubarak, resignation,
cnn, says, egypt
FA2+ parent(c)
Place, Office_holder, President, Politician…
OUTLINE
• Introduction
• Proposed Approach
• Experiments
• Dataset
• Baseline Features
• Results
• Findings
• Conclusions
PROPOSED APPROACH
Datasets
o  Twitter Dataset [Abel et al., 2011] (TW)
§  Collected during two months starting on Nov 2010.
§  Topically annotated
§  Using tweets labelled as “War & Conflict” (War),
“Law & Crime” (Cri), “Disaster &
Accident” (DisAcc).
§  Multilabelled dataset comprising 10,189 Tweets.
o  DBpedia (DB) and Freebase (FB) Dataset
§  SPARQL queried endpoints for all resources from
categories and subcategories of skos:concept of War,
Cri, DisAcc.
•  DBpedia – 9,465 articles
•  Freebase – 16,915 articles
PROPOSED APPROACH
Datasets
PROPOSED APPROACH
Experimental Setup A
1.  Use annotated Tweets for training (TW)
-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),
and Part of Speech tags (PoS).
-  Enhance Features using the DBpedia and Freebase
graphs.
2.  Train a SVM classifier based on the TW corpus. Trained/
Tested on 80%-20% over five independent runs.
3.  Compute Precision, Recall, and F-measure.
PROPOSED APPROACH
Results for TW dataset
PROPOSED APPROACH
Experimental Setup B
1.  Use labelled articles from DBpedia (DB) and Freebase
(FB) for training
-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),
and Part of Speech tags (PoS).
-  Enhance Features using the DBpedia and Freebase
graphs.
2.  Train a SVM classifier based on the DB, FB, DB+FB, DB
+FB+TW training corpus and test on TW. Trained/Tested
on 80%-20% over five independent runs.
3.  Compute Precision, Recall, and F-measure.
PROPOSED APPROACH
Results for Training on KS articles, and Testing on TW
PROPOSED APPROACH
Factors contributing to the performance of a KS graph for TC
1.  Topic-Class Entropy
2.  Entity-Class Entropy
3.  Topic-Class-Property Entropy
PROPOSED APPROACH
Correlating Entropy metrics with the performance of the
cross-source TC classifiers.
PROPOSED APPROACH
Correlating Entropy metrics with the performance of the
cross-source TC classifiers.
Indicates that the higher the number of ambiguous
entities in a topic within a KS graph, the lower the
performance of the TC.
FINDINGS
1.  KSs combined with Twitter data provide complementary
information for TC of Tweets, outperforming the KS
approaches and the approach using Tweets only.
2.  A KS performance on TC depends on the coverage of
the entities within that KS.
3.  When entities have low coverage in a KS, exploiting the
mapping between corresponding KSs’ ontologies is
beneficial.
CONCLUSIONS
•  Explored the task of topic classification of tweets
•  Exploited information in KSs (e.g. DBpedia, Freebase)
using semantic graphs for concepts and properties
surrounding an entity.
•  Presented the importance of considering graph
structures in KSs for the supervised classification of
tweets, by achieving significant improvement over
various state-of-the-art approaches using both single
KSs and Tweets only.
CONTACT US
A.  Elizabeth Cano
•  http://people.kmi.open.ac.uk/cano/
B.  Andrea Varga
•  http://sites.google.com/site/missandreavarga/
C.  Matthew Rowe
•  http://lancs.ac.uk/staff/rowem/
D.  Fabio Ciravegna
•  http://staffwww.dcs.shef.ac.uk/people/F.Ciravegna
E.  Yulan He
•  http://www1.aston.ac.uk/eas/staff/dr-yulan-he

More Related Content

What's hot

Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalFaegheh Hasibi
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)krisztianbalog
 
Rules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataRules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataHang Dong
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 

What's hot (6)

Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity Retrieval
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
 
Rules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataRules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging data
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 

Viewers also liked

CIMAT 2011 - Problemáticas
CIMAT 2011 - ProblemáticasCIMAT 2011 - Problemáticas
CIMAT 2011 - Problemáticasmikealebrije
 
Centro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad PresentacionCentro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad PresentacionRamon Costa i Pujol
 
Concepts of IT-Based Modern Living
Concepts of IT-Based Modern LivingConcepts of IT-Based Modern Living
Concepts of IT-Based Modern Livingmatthiasvogt
 
Villa Victoria Mar Del Plata
Villa Victoria   Mar Del PlataVilla Victoria   Mar Del Plata
Villa Victoria Mar Del Platavirginiae
 
Curriculum Febbraio 2009
Curriculum Febbraio 2009Curriculum Febbraio 2009
Curriculum Febbraio 2009limpbizkit
 
Genano professional air decontamination
Genano professional air decontaminationGenano professional air decontamination
Genano professional air decontaminationpekka ilmaranta
 
Fuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarlaFuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarlaJaime Diaz
 
Dossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne AndristDossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne AndristAnne Andrist
 
Welcomm Presentation 2
Welcomm Presentation 2Welcomm Presentation 2
Welcomm Presentation 2Sonal Haja
 
Presentacion athagon ingame
Presentacion athagon ingamePresentacion athagon ingame
Presentacion athagon ingameAthagon
 
How Consumers Engage with Mobile Apps
How Consumers Engage with Mobile AppsHow Consumers Engage with Mobile Apps
How Consumers Engage with Mobile AppsSIXTY
 
120925 meroni polimi desis lab
120925 meroni polimi desis lab120925 meroni polimi desis lab
120925 meroni polimi desis labmakeacube
 
Green with liability
Green with liabilityGreen with liability
Green with liabilityFERMA
 
Tarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit MástercardTarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit MástercardBanco Nacional
 
The Search For Peace Pdrc
The Search For Peace PdrcThe Search For Peace Pdrc
The Search For Peace Pdrcibrahimrainbow
 

Viewers also liked (20)

CIMAT 2011 - Problemáticas
CIMAT 2011 - ProblemáticasCIMAT 2011 - Problemáticas
CIMAT 2011 - Problemáticas
 
Centro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad PresentacionCentro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad Presentacion
 
Concepts of IT-Based Modern Living
Concepts of IT-Based Modern LivingConcepts of IT-Based Modern Living
Concepts of IT-Based Modern Living
 
Actividad 1. módulo vi. sustentación. clcp
Actividad 1. módulo vi. sustentación. clcpActividad 1. módulo vi. sustentación. clcp
Actividad 1. módulo vi. sustentación. clcp
 
Villa Victoria Mar Del Plata
Villa Victoria   Mar Del PlataVilla Victoria   Mar Del Plata
Villa Victoria Mar Del Plata
 
Curriculum Febbraio 2009
Curriculum Febbraio 2009Curriculum Febbraio 2009
Curriculum Febbraio 2009
 
Genano professional air decontamination
Genano professional air decontaminationGenano professional air decontamination
Genano professional air decontamination
 
Auxiliar juveniles 1 Trim 2011
Auxiliar juveniles 1 Trim 2011Auxiliar juveniles 1 Trim 2011
Auxiliar juveniles 1 Trim 2011
 
Fuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarlaFuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarla
 
Dossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne AndristDossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne Andrist
 
Dsg Studie Emotions
Dsg Studie EmotionsDsg Studie Emotions
Dsg Studie Emotions
 
Welcomm Presentation 2
Welcomm Presentation 2Welcomm Presentation 2
Welcomm Presentation 2
 
Presentacion athagon ingame
Presentacion athagon ingamePresentacion athagon ingame
Presentacion athagon ingame
 
Master en Dirección y Gestión de Empresas de Moda
Master en Dirección y Gestión de Empresas de ModaMaster en Dirección y Gestión de Empresas de Moda
Master en Dirección y Gestión de Empresas de Moda
 
Reiner 940 HandJet printer
Reiner 940 HandJet printerReiner 940 HandJet printer
Reiner 940 HandJet printer
 
How Consumers Engage with Mobile Apps
How Consumers Engage with Mobile AppsHow Consumers Engage with Mobile Apps
How Consumers Engage with Mobile Apps
 
120925 meroni polimi desis lab
120925 meroni polimi desis lab120925 meroni polimi desis lab
120925 meroni polimi desis lab
 
Green with liability
Green with liabilityGreen with liability
Green with liability
 
Tarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit MástercardTarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit Mástercard
 
The Search For Peace Pdrc
The Search For Peace PdrcThe Search For Peace Pdrc
The Search For Peace Pdrc
 

Similar to Harnessing Linked Knowledge Sources for Topic Classification in Social Media

Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD Aldo Gangemi
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsStretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsAmparo Elizabeth Cano Basave
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generationkrisztianbalog
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 

Similar to Harnessing Linked Knowledge Sources for Topic Classification in Social Media (20)

Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
AI_Session 21 First order logic.pptx
AI_Session 21 First order logic.pptxAI_Session 21 First order logic.pptx
AI_Session 21 First order logic.pptx
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsStretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
LDAvis
LDAvisLDAvis
LDAvis
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 

More from Amparo Elizabeth Cano Basave

A Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesA Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesAmparo Elizabeth Cano Basave
 
Detecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaDetecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaAmparo Elizabeth Cano Basave
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsAmparo Elizabeth Cano Basave
 
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Amparo Elizabeth Cano Basave
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Amparo Elizabeth Cano Basave
 
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Amparo Elizabeth Cano Basave
 
Veracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web ResourcesVeracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web ResourcesAmparo Elizabeth Cano Basave
 

More from Amparo Elizabeth Cano Basave (13)

A Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesA Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political Debates
 
Detecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaDetecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social media
 
Violence det ijcnlp13-slideshare
Violence det ijcnlp13-slideshareViolence det ijcnlp13-slideshare
Violence det ijcnlp13-slideshare
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity Streams
 
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
 
Topica
TopicaTopica
Topica
 
Does sizematter
Does sizematterDoes sizematter
Does sizematter
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Ekaw2010 tutorial3
Ekaw2010 tutorial3Ekaw2010 tutorial3
Ekaw2010 tutorial3
 
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
 
Veracity poster
Veracity posterVeracity poster
Veracity poster
 
Veracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web ResourcesVeracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web Resources
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Harnessing Linked Knowledge Sources for Topic Classification in Social Media

  • 1. A. Elizabeth Cano, Andrea VargaŸ, Matthew Rowew, Fabio CiravegnaŸ, and Yulan He° Knowledge Media Institute, The Open University, Milton Keynes Ÿ University of Sheffield, Sheffield w Lancaster University, Lancaster ° Aston University, Birmingham UK. 2013 Harnessing Linked Knowledge Sources for Topic Classification in Social Media
  • 2. INTRODUCTION Social Media Streams - Risk in violent and criminal activities
  • 3. INTRODUCTION Research Questions: o  Can semantic features help in topic classification (TC)? o  Which knowledge source (KS) data and KS taxonomies provide useful information for improving the TC of tweets?
  • 4. OUTLINE • Introduction - Topic Classification (TC) of Microposts - Related Work - State of the art limitations • Proposed Approach • Experiments • Findings • Conclusions
  • 5. INTRODUCTION u  Difficulties of Topic Classification of microposts o  Restricted number of characters o  Irregular and ill-formed words •  Mixing upper and lowercase letter §  Makes it difficult to detect proper nouns, and other part of speech tags. •  Wide variety of language §  E.g., “see u soon” o  Event-dependent emerging jargon • Volatile jargon relevant to particular events §  E.g., “Jan.25” (used during the Egyptian revolution o  High Topical Diversity o  Sparse data
  • 6. INTRODUCTION Social Knowledge Sources (KS) DBpedia* Yago2 Freebase Resources 2.35 million 447million 3.6 million Classes 359 562,312 1,450 Properties 1,820 253,213,842 7,000 *Using dbpedia ontology o  Structured Semantic Web Representation of data •  Maintained by thousand of editors §  E.g DBpedia, derived from Wikipedia §  Freebase •  Evolves and adapts as knowledge changes [Syed et al, 2008] o  Cover a broad range of topics o  Characterise topics with a large number of resources
  • 7. INTRODUCTION Local and External Metadata of a Tweet
  • 8. INTRODUCTION Local and External Metadata of a Tweet NER:CountryNER:Person NER:Person
  • 9. INTRODUCTION Local and External Metadata of a Tweet NER:CountryNER:Person NER:Person <http://dbpedia.org/resource/Barack_Obama <http://dbpedia.org/resource/Egypt <http://dbpedia.org/resource/Hosni_Mubarak
  • 10. PROPOSED APPROACH o  State of the art limitations §  Use of single knowledge sources §  Entities’ metadata is constrained by the used NER service (e.g OpenCalais, Alchemy). o  Our approach §  Exploits multiple knowledge sources. §  Enhances the entity metadata by deriving semantic graphs. §  Leverages the graph structures surrounding entities present in a KS for the TC task. Exploiting Knowledge Sources for the Topic Classification of Microposts
  • 11. OUTLINE • Introduction • Proposed Approach • Semantic Meta-graphs • Weighting Schemas • Enhancing TC with Semantic Features • Experiments • Findings • Conclusions
  • 13. PROPOSED APPROACH Rationale… 1 2 Could be more indicative of War and Conflict
  • 14. PROPOSED APPROACH Rationale… 2 Not necessarily a good indicator of War and Conflict
  • 15. PROPOSED APPROACH Rationale… 1 2 Can the graph structure of existing Knowledge sources provide an abstraction of the use of these entity types for representing a topic ?
  • 16. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 1 Datasets Collection SPARQL query for all resources from a given Topic (e.g. War )
  • 17. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 18. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 19. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 20. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 3 Semantic Features Derivation
  • 21. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 4 Build a Topic Classifier based on Features Derived from Crossed-Sources
  • 22. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 4 Build a Topic Classifier based on Features Derived from Crossed-Sources
  • 23. PROPOSED APPROACH Deriving Semantic Meta-Graphs <dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates> <dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  • 24. PROPOSED APPROACH Deriving Semantic Meta-Graphs <dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates> <dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  • 25. PROPOSED APPROACH Definition 1- Resource Meta-graph Is a sequence of tuples G:=(R,P,C,Y) where •  R, P, C are finite sets whose elements are resources, properties and classes; •  Y is a ternary relation representing a hypergraph with ternary edges. •  Y is a tripartite graph where the vertices are Y ! R " P "C H Y( ) = V, D D = r, p,c{ } r, p,c( ) ! Y{ }
  • 26. PROPOSED APPROACH Resource Meta-graph The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity. Obama birthPlace author spouse Projecting on Properties Projecting on Classes LivingPeople PresidentOfTheUnitedStates Obama Person Author
  • 27. PROPOSED APPROACH Resource Meta-graph The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity. Obama birthPlace author spouse Projecting on Properties Projecting on Classes LivingPeople PresidentOfTheUnitedStates Obama Person Author How can we weight these graphs to reveal semantic features characterise Obama in the context of Violence? ? ? ? ? ?? ?
  • 28. PROPOSED APPROACH Weighting Semantic Features Specificity Measures the relative importance of a property to a given class in a KS graph GKS: p ! G e( ) c ! G e( ) specificityKS p,c( ) = pN R(c)( ) N(R(c))
  • 29. PROPOSED APPROACH Weighting Semantic Features Generality Captures the specialisation of a property p to a given class c, by computing the property’s frequency among other semantically related classes R’(c). Where N(R’(c)) is the number of resources whose type is either c or a specialisation of c’s parent classes. generalityKS p,c( ) = N R'(c)( ) pN (R'(c))
  • 30. PROPOSED APPROACH Weighting Semantic Features SG p,c( ) = specificityKS p,c( )! generalityKS p,c( )
  • 31. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation (A1) Class Features Property Features Class+ Property Features A1!CF' = F + CF A1!PF' = F + pF A1!C+PF' = F + cF + pF
  • 32. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation (A1) Class Features Property Features Class+ Property Features A1!CF' = F + CF A1!PF' = F + pF A1!C+PF' = F + cF + pF F president, obama, televised, statement, hosni, mubarak, resignation, cnn, says, egypt FA1+ P dbpedia:birth, dbpedia:state, …., dbpedia-owl:PopulatedPlace/ populationDensity…. FA1+ C PopulatedPlace, Office_holder, PresidentOfTheUnitedStates, Politician…
  • 33. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation with Generalisation (A2) This augmentation exploits the subsumption relation among classes within the DBpedia or Freebase ontologies. In this cases we consider the set of parent classes of c. Parent(c) Features Parent(c) + Property Features A2!CF' = F + parent(c)F A2!C+PF' = F + pF + parent(c)F
  • 34. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation with Generalisation (A2) This augmentation exploits the subsumption relation among classes within the DBpedia or Freebase ontologies. In this cases we consider the set of parent classes of c. Parent(c) Features Parent(c)+Property Features A2!CF' = F + parent(c)F A2!C+PF' = F + pF + parent(c)F F president, obama, televised, statement, hosni, mubarak, resignation, cnn, says, egypt FA2+ parent(c) Place, Office_holder, President, Politician…
  • 36. PROPOSED APPROACH Datasets o  Twitter Dataset [Abel et al., 2011] (TW) §  Collected during two months starting on Nov 2010. §  Topically annotated §  Using tweets labelled as “War & Conflict” (War), “Law & Crime” (Cri), “Disaster & Accident” (DisAcc). §  Multilabelled dataset comprising 10,189 Tweets. o  DBpedia (DB) and Freebase (FB) Dataset §  SPARQL queried endpoints for all resources from categories and subcategories of skos:concept of War, Cri, DisAcc. •  DBpedia – 9,465 articles •  Freebase – 16,915 articles
  • 38. PROPOSED APPROACH Experimental Setup A 1.  Use annotated Tweets for training (TW) -  Baseline: Bag of Words (BoW), Bag of Entities (BoE), and Part of Speech tags (PoS). -  Enhance Features using the DBpedia and Freebase graphs. 2.  Train a SVM classifier based on the TW corpus. Trained/ Tested on 80%-20% over five independent runs. 3.  Compute Precision, Recall, and F-measure.
  • 40. PROPOSED APPROACH Experimental Setup B 1.  Use labelled articles from DBpedia (DB) and Freebase (FB) for training -  Baseline: Bag of Words (BoW), Bag of Entities (BoE), and Part of Speech tags (PoS). -  Enhance Features using the DBpedia and Freebase graphs. 2.  Train a SVM classifier based on the DB, FB, DB+FB, DB +FB+TW training corpus and test on TW. Trained/Tested on 80%-20% over five independent runs. 3.  Compute Precision, Recall, and F-measure.
  • 41. PROPOSED APPROACH Results for Training on KS articles, and Testing on TW
  • 42. PROPOSED APPROACH Factors contributing to the performance of a KS graph for TC 1.  Topic-Class Entropy 2.  Entity-Class Entropy 3.  Topic-Class-Property Entropy
  • 43. PROPOSED APPROACH Correlating Entropy metrics with the performance of the cross-source TC classifiers.
  • 44. PROPOSED APPROACH Correlating Entropy metrics with the performance of the cross-source TC classifiers. Indicates that the higher the number of ambiguous entities in a topic within a KS graph, the lower the performance of the TC.
  • 45. FINDINGS 1.  KSs combined with Twitter data provide complementary information for TC of Tweets, outperforming the KS approaches and the approach using Tweets only. 2.  A KS performance on TC depends on the coverage of the entities within that KS. 3.  When entities have low coverage in a KS, exploiting the mapping between corresponding KSs’ ontologies is beneficial.
  • 46. CONCLUSIONS •  Explored the task of topic classification of tweets •  Exploited information in KSs (e.g. DBpedia, Freebase) using semantic graphs for concepts and properties surrounding an entity. •  Presented the importance of considering graph structures in KSs for the supervised classification of tweets, by achieving significant improvement over various state-of-the-art approaches using both single KSs and Tweets only.
  • 47. CONTACT US A.  Elizabeth Cano •  http://people.kmi.open.ac.uk/cano/ B.  Andrea Varga •  http://sites.google.com/site/missandreavarga/ C.  Matthew Rowe •  http://lancs.ac.uk/staff/rowem/ D.  Fabio Ciravegna •  http://staffwww.dcs.shef.ac.uk/people/F.Ciravegna E.  Yulan He •  http://www1.aston.ac.uk/eas/staff/dr-yulan-he

Editor's Notes

  1. I will present a work done in collaboration with the universities of sheffield, lancaster and Aston. This work was done as part of the Violence Detection project which investigates different approaches for the detection of violence-related events emerging from social media streams.
  2. During the last 2 years we have witnessed the use of these services to express different emotions within society; these services have become a proxy of information which communicates the social perception of situations regarding for exampleTerrorismSocial Crisis RacismTherefore the real time identification of the topics discussed in these channels could aid in different scenarios includeing violence detection and emergency response situations.
  3. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  4. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  5. These two tweets make reference to the same entity, “President Obama”.However the context in which the entity is used is different, in the first case, the co-occurrence of Obama, Egypt and Mubarak could be more indicative of the War and Conlict topic, while in the second case the occurrence of President Obama and Michelle, is less likely to indicate a war and conflict related topic.So we wonder whether the graph structure of existing Knowledge source could aid in provide an abstraction of the use of these entity types for representing a topic.
  6. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  7. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  8. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  9. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and ConnflictHow can we weight this graphs so as to reveal which of these features characterise Obama in the context of Violence?
  10. In order to capture the relative importance of each feature in a semantic meta-graph we propose two different weighting strategies. These are based on generality and specificity of a feature in a given meta-graph.Models the relative importance of a property p to a given class, together with the generality of the property in a KS’s graph.Where Np is the number of times property p appears in all resources of type c in the KS graph KS.
  11. In order to capture the relative importance of each feature in a semantic meta-graph we propose two different weighting strategies. These are based on generality and specificity of a feature in a given meta-graph.Models the relative importance of a property p to a given class, together with the generality of the property in a KS’s graph.Where Np is the number of times property p appears in all resources of type c in the KS graph KS.
  12. Where parent(c) denotes the total number of unique parent classes derived from a Ks graph.
  13. For evaluating the impact of enhancing the feature space with semantic features for the task of topic classification of tweets. We evaluated the performance of using a large corpus of tweets and a two large coverage KS which are Dbpedia and Freebase. The Twitter dataset was derived previously by Abel et al. and it comprises tweets which were collected during two months starting from November 2010. This dataset has been topically annotated.
  14. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.
  15. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.Topic-Class Entropy :- Low entropy(LE) indicates a focused topic, while high entropy(HE) indicates that it is more random on the subjects it discusses.Entity-Class Entropy: - LE indicates a topic is less ambiguous (i.e. entities belong to fewer classes, while (HE) high ambiguity at the level of the entities. Topic-Class-Property Entropy:- LE indicates a topic is dominated by few class-properties, while (HE) reveals high property diversity.
  16. The darker the closer to red the more correlated the values are. These indicates that as the number of ambiguous entities increases in a topic, the performance of the TC decreases.
  17. The darker the closer to red the more correlated the values are. These indicates that as the number of ambiguous entities increases in a topic, the performance of the TC decreases.
  18. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.
  19. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.