SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Alberto Parravicini
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Fast Entity
Linking via Graph
Embeddings
Alberto Parravicini 23/05/2019
Entity Linking (EL): connecting words of
interest to unique identities (e.g. Wikipedia
Page)
Entity Linking
2
Alberto Parravicini 23/05/2019
Component of applications that require
high-level representations of text:
1. Search Engines, for semantic search
2. Recommender Systems, to retrieve documents
similar to each other
3. Chat bots, to understand intents and entities
Use Cases
3
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
1. Named Entity Recognition (NER): spot
mentions (a.k.a. Named Entities)
● High-accuracy in the state-of-the-art[1]
The EL Pipeline (1/2)
[1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging."
4
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
5
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
6
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
7
Alberto Parravicini 23/05/2019
● Novel unsupervised framework for EL
● No dependency on NLP
● First EL algorithm to use graph embeddings
● Accuracy similar to supervised SoA techniques
● Highly scalable and real-time execution time
● < 1 sec to process text with 30+ mentions
Our contributions
8
Alberto Parravicini 23/05/2019
Our EL Pipeline
Input text
with NER
Graph Building Embeddings Candidate Finder
Disambiguation
Graph Learning
Execution
Output
1 2 3 4
9
Alberto Parravicini 23/05/2019
We obtain a large graph from DBpedia
● All the information of Wikipedia, stored as triples
● 12M entities, 170M links
Step 1/4
Graph Creation
10
Alberto Parravicini 23/05/2019
From DBPedia, we build two graphs
Redirects GraphProperty Graph
Step 1/4
Graph Creation
11
Alberto Parravicini 23/05/2019
● Graph embeddings encode vertices as vectors
● “Similar” vertices have “similar” embeddings
● Idea: entities with the same context should have
low embedding distance
Embeddings Creation (1/2)
Step 2/4
12
Alberto Parravicini 23/05/2019
● In our work, we use DeepWalk[1]
● Like word2vec[2]
, it leverages random walks (i.e.
vertex sequences) to create embeddings
● Embedding size 170, walk length 8
● DeepWalk uses only the graph topology
● Simple baseline, we can use better algorithms
and leverage graph features
Embeddings Creation (2/2)
[1] Bryan Perozzi et al.2014. Deepwalk
[2] Tomas Mikolov et al. 2013. Distributed
representations of words and phrases and
their compositionality
Step 2/4
13
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
Step 3/4
14
Alberto Parravicini 23/05/2019
Candidates Finder
Step 3/4
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
15
Alberto Parravicini 23/05/2019
Disambiguation (1/3)
●We want to pick the “best” candidate for
each mention
● In a “good” solution, candidates are related to
each other (e.g. Donald Trump, Hillary Clinton)
●Observation: a good tuple of candidates has
embeddings close to each other
●Evaluating all combinations
is infeasible
● 10 mentions with 100 candidates 10010
Step 4/4
16
Alberto Parravicini 23/05/2019
17
●We use an heuristic state-space search
algorithm to maximize:
Sum of string
similarities
Step 4/4
Sum of embedding
cosine similarities
w.r.t. tuple mean
Disambiguation (2/3)
Alberto Parravicini 23/05/2019
● Iterative state-space heuristic
Greedy
iterative
procedure
Step 4/4
18
Disambiguation (3/3)
Alberto Parravicini 23/05/2019
● We compared against 6 SoA EL algorithms, on 5
datasets
● Our Micro-averaged F1 score is comparable with
SoA supervised algorithms
Results: accuracy
19
Alberto Parravicini 23/05/2019
● Different settings enable real-time EL, with
minimal loss in accuracy
● E.g. number of iterations, early stop
Results: exec. time
20
Alberto Parravicini 23/05/2019
● Execution time is well divided between
Candidate Finder and Disambiguation
Results: exec. time of
single steps
21
Thank you!
Fast Entity Linking via Graph Embeddings
● Novel unsupervised framework for EL
● First EL algorithm to use graph embeddings
○ Accuracy similar to supervised SoA techniques
● Real-time execution time
Alberto Parravicini, alberto.parravicini@polimi.it
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Alberto Parravicini 23/05/2019
●Turn topology and properties of each vertex
into a vector
●“Similar” vertices have “similar” embeddings
Embeddings
Step 2/4
embedding
properties
topology
Hamilton, William L., Rex Ying, and Jure Leskovec.
"Representation learning on graphs: Methods and
applications." arXiv preprint arXiv:1709.05584 (2017).
Alberto Parravicini 23/05/2019
… and join them together
Step 1/4
Graph Creation
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a small number of
candidate vertices with string similarity
● We use a simple index-based string search
● Fuzzy matching with 2-grams and 3-grams
● This provide a simple baseline (60-70%
accuracy)
Step 3/4

Contenu connexe

Tendances

Configuration of classes
Configuration of classesConfiguration of classes
Configuration of classesanumshakeel5
 
Media4Math's Algebra Series
Media4Math's Algebra SeriesMedia4Math's Algebra Series
Media4Math's Algebra SeriesMedia4math
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Yannis Kalfoglou
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and MilestoneBarry Norton
 
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)Igalia
 
Fast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeFast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeChristopher Conlan
 
Integrate with Tracing and Logging
Integrate with Tracing and LoggingIntegrate with Tracing and Logging
Integrate with Tracing and LoggingAya Ozawa (Igarashi)
 
Unit 1 polynomial manipulation
Unit 1   polynomial manipulationUnit 1   polynomial manipulation
Unit 1 polynomial manipulationLavanyaJ28
 

Tendances (13)

Configuration of classes
Configuration of classesConfiguration of classes
Configuration of classes
 
Bio 180208
Bio 180208Bio 180208
Bio 180208
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
Media4Math's Algebra Series
Media4Math's Algebra SeriesMedia4Math's Algebra Series
Media4Math's Algebra Series
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and Milestone
 
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
 
Fast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeFast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster Code
 
brf to mathml
brf to mathmlbrf to mathml
brf to mathml
 
Intro to Elixir
Intro to ElixirIntro to Elixir
Intro to Elixir
 
Integrate with Tracing and Logging
Integrate with Tracing and LoggingIntegrate with Tracing and Logging
Integrate with Tracing and Logging
 
Unit 1 polynomial manipulation
Unit 1   polynomial manipulationUnit 1   polynomial manipulation
Unit 1 polynomial manipulation
 

Similaire à Fast Entity Linking via Graph Embeddings

Gospel - High-performance heterogeneous architectures for graph analytics
 Gospel - High-performance heterogeneous architectures for graph analytics Gospel - High-performance heterogeneous architectures for graph analytics
Gospel - High-performance heterogeneous architectures for graph analyticsNECST Lab @ Politecnico di Milano
 
IRJET- Python Based Machine Learning for Profile Matching
IRJET- 	  Python Based Machine Learning for Profile MatchingIRJET- 	  Python Based Machine Learning for Profile Matching
IRJET- Python Based Machine Learning for Profile MatchingIRJET Journal
 
Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...ijeukens
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGIJCI JOURNAL
 
Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontanaAjay Ohri
 
Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Andrea Gigli
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016ijcsbi
 
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdfcis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdfMinhLe595264
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia UniversityTunghai University
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET Journal
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
Locloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging toolsLocloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging toolslocloud
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET Journal
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTHong-Linh Truong
 
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET-  	  Hosting NLP based Chatbot on AWS Cloud using DockerIRJET-  	  Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET- Hosting NLP based Chatbot on AWS Cloud using DockerIRJET Journal
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonIRJET Journal
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET -  	  Speech to Speech Translation using Encoder Decoder ArchitectureIRJET -  	  Speech to Speech Translation using Encoder Decoder Architecture
IRJET - Speech to Speech Translation using Encoder Decoder ArchitectureIRJET Journal
 

Similaire à Fast Entity Linking via Graph Embeddings (20)

Gospel - High-performance heterogeneous architectures for graph analytics
 Gospel - High-performance heterogeneous architectures for graph analytics Gospel - High-performance heterogeneous architectures for graph analytics
Gospel - High-performance heterogeneous architectures for graph analytics
 
IRJET- Python Based Machine Learning for Profile Matching
IRJET- 	  Python Based Machine Learning for Profile MatchingIRJET- 	  Python Based Machine Learning for Profile Matching
IRJET- Python Based Machine Learning for Profile Matching
 
Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
 
Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontana
 
Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
CIS 5 Project.pdf
CIS 5 Project.pdfCIS 5 Project.pdf
CIS 5 Project.pdf
 
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdfcis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia University
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document Clustering
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Locloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging toolsLocloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging tools
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLP
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoT
 
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET-  	  Hosting NLP based Chatbot on AWS Cloud using DockerIRJET-  	  Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET -  	  Speech to Speech Translation using Encoder Decoder ArchitectureIRJET -  	  Speech to Speech Translation using Encoder Decoder Architecture
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
 

Plus de NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

Plus de NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Dernier

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Dernier (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 

Fast Entity Linking via Graph Embeddings

  • 1. Alberto Parravicini Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX Fast Entity Linking via Graph Embeddings
  • 2. Alberto Parravicini 23/05/2019 Entity Linking (EL): connecting words of interest to unique identities (e.g. Wikipedia Page) Entity Linking 2
  • 3. Alberto Parravicini 23/05/2019 Component of applications that require high-level representations of text: 1. Search Engines, for semantic search 2. Recommender Systems, to retrieve documents similar to each other 3. Chat bots, to understand intents and entities Use Cases 3
  • 4. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 1. Named Entity Recognition (NER): spot mentions (a.k.a. Named Entities) ● High-accuracy in the state-of-the-art[1] The EL Pipeline (1/2) [1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." 4
  • 5. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 5
  • 6. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 6
  • 7. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 7
  • 8. Alberto Parravicini 23/05/2019 ● Novel unsupervised framework for EL ● No dependency on NLP ● First EL algorithm to use graph embeddings ● Accuracy similar to supervised SoA techniques ● Highly scalable and real-time execution time ● < 1 sec to process text with 30+ mentions Our contributions 8
  • 9. Alberto Parravicini 23/05/2019 Our EL Pipeline Input text with NER Graph Building Embeddings Candidate Finder Disambiguation Graph Learning Execution Output 1 2 3 4 9
  • 10. Alberto Parravicini 23/05/2019 We obtain a large graph from DBpedia ● All the information of Wikipedia, stored as triples ● 12M entities, 170M links Step 1/4 Graph Creation 10
  • 11. Alberto Parravicini 23/05/2019 From DBPedia, we build two graphs Redirects GraphProperty Graph Step 1/4 Graph Creation 11
  • 12. Alberto Parravicini 23/05/2019 ● Graph embeddings encode vertices as vectors ● “Similar” vertices have “similar” embeddings ● Idea: entities with the same context should have low embedding distance Embeddings Creation (1/2) Step 2/4 12
  • 13. Alberto Parravicini 23/05/2019 ● In our work, we use DeepWalk[1] ● Like word2vec[2] , it leverages random walks (i.e. vertex sequences) to create embeddings ● Embedding size 170, walk length 8 ● DeepWalk uses only the graph topology ● Simple baseline, we can use better algorithms and leverage graph features Embeddings Creation (2/2) [1] Bryan Perozzi et al.2014. Deepwalk [2] Tomas Mikolov et al. 2013. Distributed representations of words and phrases and their compositionality Step 2/4 13
  • 14. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges Step 3/4 14
  • 15. Alberto Parravicini 23/05/2019 Candidates Finder Step 3/4 Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges 15
  • 16. Alberto Parravicini 23/05/2019 Disambiguation (1/3) ●We want to pick the “best” candidate for each mention ● In a “good” solution, candidates are related to each other (e.g. Donald Trump, Hillary Clinton) ●Observation: a good tuple of candidates has embeddings close to each other ●Evaluating all combinations is infeasible ● 10 mentions with 100 candidates 10010 Step 4/4 16
  • 17. Alberto Parravicini 23/05/2019 17 ●We use an heuristic state-space search algorithm to maximize: Sum of string similarities Step 4/4 Sum of embedding cosine similarities w.r.t. tuple mean Disambiguation (2/3)
  • 18. Alberto Parravicini 23/05/2019 ● Iterative state-space heuristic Greedy iterative procedure Step 4/4 18 Disambiguation (3/3)
  • 19. Alberto Parravicini 23/05/2019 ● We compared against 6 SoA EL algorithms, on 5 datasets ● Our Micro-averaged F1 score is comparable with SoA supervised algorithms Results: accuracy 19
  • 20. Alberto Parravicini 23/05/2019 ● Different settings enable real-time EL, with minimal loss in accuracy ● E.g. number of iterations, early stop Results: exec. time 20
  • 21. Alberto Parravicini 23/05/2019 ● Execution time is well divided between Candidate Finder and Disambiguation Results: exec. time of single steps 21
  • 22. Thank you! Fast Entity Linking via Graph Embeddings ● Novel unsupervised framework for EL ● First EL algorithm to use graph embeddings ○ Accuracy similar to supervised SoA techniques ● Real-time execution time Alberto Parravicini, alberto.parravicini@polimi.it Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX
  • 23. Alberto Parravicini 23/05/2019 ●Turn topology and properties of each vertex into a vector ●“Similar” vertices have “similar” embeddings Embeddings Step 2/4 embedding properties topology Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).
  • 24. Alberto Parravicini 23/05/2019 … and join them together Step 1/4 Graph Creation
  • 25. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a small number of candidate vertices with string similarity ● We use a simple index-based string search ● Fuzzy matching with 2-grams and 3-grams ● This provide a simple baseline (60-70% accuracy) Step 3/4