SlideShare une entreprise Scribd logo
1  sur  24
06/01/17 Heiko Paulheim 1
Data-driven Joint Debugging
of the DBpedia Mappings and Ontology
Towards Addressing the Causes
instead of the Symptoms of Data Quality in DBpedia
Heiko Paulheim
06/01/17 Heiko Paulheim 2
Motivation
• Various works on finding errors in Knowledge Graphs
– 2017 survey: 17 approaches
– 15/17 are evaluated on DBpedia
• Question:
– How does DBpedia benefit
from those works?
￘
H. Paulheim: Knowledge Graph Refinement – A Survey
of Approaches and Evaluation Methods. SWJ 8(3), 2017
06/01/17 Heiko Paulheim 3
Motivation
• What comes out of those research works
– A list of (possibly) wrong statements
– Source code for finding erroneous statements
– ...
06/01/17 Heiko Paulheim 4
Motivation
• Possible option 1: Remove erroneous triples from DBpedia
• Challenges
– May remove correct axioms, may need thresholding
– Needs to be repeated for each release
– Needs to be materialized on all of DBpedia
DBpedia
Extraction
FrameworkWikipedia
DBpedia Mappings Wiki
Post
Filter
06/01/17 Heiko Paulheim 5
Motivation
• Materialized on full DBpedia: 8/15 approaches
06/01/17 Heiko Paulheim 6
Motivation
• Possible option 2: Integrate into DBpedia Extraction Framework
• Challenges
– Development workload
– Some approaches are not fully automated (technically or conceptually)
– Scalability
DBpedia
Extraction
Framework
plus filter
module
Wikipedia
DBpedia Mappings Wiki
06/01/17 Heiko Paulheim 7
Motivation
• Scalability analyzed: 6/15
Disclaimer: does not imply
that it is actually scalable!
06/01/17 Heiko Paulheim 8
Motivation
• Do we have a third option?
– Paulheim & Gangemi (2015): >95% of all inconsistencies in DBpedia
boil down to 40 common root causes
Wikipedia
DBpedia Mappings Wiki
DBpedia
Extraction
Framework
Inconsistency
DetectionIdentification
of suspicious
mappings and
ontology
constructs
H. Paulheim, A. Gangemi: Serving DBpedia with DOLCE – More than Just Adding a Cherry on Top (ISWC 2015)
Disclaimer: not equivalent to
“wrong statements”
06/01/17 Heiko Paulheim 9
Approach
dbr:San_Diego_
County,_California
dbr:Agua_Caliente_
Airport
dbo:operator
foaf:name
dbo:Airport dbo:Settlement
dbo:Place
dbo:Infrastructure
dbo:Architectural-
Structure
dbo:Agent
owl:disjoint
With
rdf:type
rdf:type
“Agua Caliente Airport”
dbo:PopulatedPlace
dbo:Organisation
rdfs:range
Obama
free
Example!
06/01/17 Heiko Paulheim 10
Approach
• Find inconsistencies in extracted statements
– Using DBpedia and DOLCE as top level ontology
• Trace them back to mappings
– In the example, there are three candidates
• Property mapping to the predicate dbo:operator
• Class mapping (subject) to dbo:Airport
• Class mapping (object) to dbo:Settlement
• Unfortunately, provenance information for DBpedia
is not that fine-grained
– i.e., we do not know which mapping was responsible for which
statement in the end
– first step: heuristic reconstruction
06/01/17 Heiko Paulheim 11
Approach: Identifying Mapping Elements
[1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016)
Wikipedia Page
DBpedia Resource
• We use the RML representation of the Mapping Wiki contents [1]
https://www.w3.org/TR/r2rml/
06/01/17 Heiko Paulheim 12
Approach: Identifying Mapping Elements
[1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016)
DBpedia Ontology
Class
• We use the RML representation of the Mapping Wiki contents [1]
https://www.w3.org/TR/r2rml/
06/01/17 Heiko Paulheim 13
Approach: Identifying Mapping Elements
[1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016)
DBpedia Ontology
Property
• We use the RML representation of the Mapping Wiki contents [1]
https://www.w3.org/TR/r2rml/
06/01/17 Heiko Paulheim 14
Approach (ctd.)
• After we heuristically reconstructed the mappings, we can determine
– How often is a mapping element involved in an inconsistency?
– How often is a mapping element used, but not involved in an
inconsistency?
06/01/17 Heiko Paulheim 15
Approach (ctd.)
• Using the two counters cm
and im
, we can compute two scores
for the hypothesis that m is problematic
• Borrowed from Association Rule Mining (support and confidence):
• N is the total number of statements in DBpedia
06/01/17 Heiko Paulheim 16
Identifying Interesting Problems
• Hypothesis: high support and high confidence mapping elements
hint at problems worth investigating
– High support: fixing the issue would fix a lot of individual statements
– High confidence: this mapping element actually hints at the root cause
• i.e., fixing this does not break many other things
• Unfortunately, both come at different scales
– Difficult to use average, harmonic mean or the like
– Support: μ = 0.0002, σ = 0.003
– Confidence: μ = 0.114, σ = 0.260
• Fix: use logarithmic support instead
– LogSupport: μ = 0.179, σ = 0.139
06/01/17 Heiko Paulheim 17
Identifying Interesting Problems (ctd.)
• Inspect mappings that have a high harmonic mean of
confidence and log support
0.25 0.5 0.75
more interesting
06/01/17 Heiko Paulheim 18
Example Findings
• Case 1: Mapping to wrong property
• Example:
– branch in infobox military unit
is mapped to dbo:militaryBranch
• but dbo:militaryBranch
has dbo:Person as its domain
– correction: dbo:commandStructure
– Overall score: 0.721
– Affects 12,172 statements
(31% of all dbo:militaryBranch)
06/01/17 Heiko Paulheim 19
Example Findings
• Case 2: Mappings that should be removed
• Example:
– dbo:picture
– Most of the are inconsistent (64.5% places, 23.0% persons)
– Reason: statements are extracted from picture caption
dbo:Brixton_Academy
dbo:picture
dbo:Brixton .
dbo:Justify_My_Love
dbo:picture
dbo:Madonna_(entertainer) .
06/01/17 Heiko Paulheim 20
Example Findings
• Case 3: Ontology problems (domain/range)
• Example 1:
– Populated places (e.g., cities) are used both as place and organization
– For some properties, the range is either one of the two
• e.g., dbo:operator (see introductory example)
– Polysemy should be reflected in the ontology
• Example 2:
– dbo:architect, dbo:designer, dbo:engineer etc.
have dbo:Person as their range
– Significant fractions (8.6%, 7.6%, 58.4%, resp.)
have a dbo:Organization as object
– Range should be broadened
06/01/17 Heiko Paulheim 21
Example Findings
• Case 4: Missing properties
• Example 1:
– dbo:president links an organization to its president
– Majority use (8,354, or 76.2%):
link a person to the president s/he served for
• Example 2:
– dbo:instrument links an artist
to the instrument s/he plays
– Prominent alternative use (3,828, or 7.2%):
links a genre to its characteristic instrument
Obamaexamplealert!
06/01/17 Heiko Paulheim 22
Future Work
• Classify ontology, mapping, and other errors automatically
– Currently ongoing: using different language editions of DBpedia
• Heuristic:
– problem present in many languages → ontology problem
– Problem present only in one language → mapping problem
• From post-processing to live processing
– e.g., on-the-fly validation in DBpedia Mappings Wiki
06/01/17 Heiko Paulheim 23
Take Aways
• Fixing bugs in knowledge graphs is nice
– But often a one-time solution
– Preserving the efforts is hard
• Proposed solution
– Identify and address the root problem
– Scoring mechanism helps
identifying interesting problems
– Preserving the efforts by eliminating
the root causes
• Provenance matters!
– The more we know about how a statement
gets into a knowledge graph
– The better can we automate the error analysis
06/01/17 Heiko Paulheim 24
Data-driven Joint Debugging
of the DBpedia Mappings and Ontology
Towards Addressing the Causes
instead of the Symptoms of Data Quality in DBpedia
Heiko Paulheim

Contenu connexe

Tendances

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionHeiko Paulheim
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownSteffen Staab
 
Timeliner: Early Ideas
Timeliner: Early IdeasTimeliner: Early Ideas
Timeliner: Early IdeasDavid Lamas
 

Tendances (8)

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the Unknown
 
Timeliner: Early Ideas
Timeliner: Early IdeasTimeliner: Early Ideas
Timeliner: Early Ideas
 
Timeliner, early ideas
Timeliner, early ideasTimeliner, early ideas
Timeliner, early ideas
 

Similaire à Data-driven Joint Debugging of the DBpedia Mappings and Ontology

What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF DataHeiko Paulheim
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesPetar Ristoski
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...Heiko Paulheim
 
discussion_3_project.pdf
discussion_3_project.pdfdiscussion_3_project.pdf
discussion_3_project.pdfKuan-Tsae Huang
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningMikel Emaldi Manrique
 
DBT04-ER-Models-v1.pdf
DBT04-ER-Models-v1.pdfDBT04-ER-Models-v1.pdf
DBT04-ER-Models-v1.pdfNermeenKamel7
 
Introducing RDA: June 2013
Introducing RDA: June 2013Introducing RDA: June 2013
Introducing RDA: June 2013ALATechSource
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13Simeon Warner
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
Chapter 1 Short Slide.pdf
Chapter 1 Short Slide.pdfChapter 1 Short Slide.pdf
Chapter 1 Short Slide.pdfGirmaNeshir
 
Series 2, Texas A&M University Libraries LibGuides Town Hall Meeting
Series 2, Texas A&M University Libraries LibGuides Town Hall MeetingSeries 2, Texas A&M University Libraries LibGuides Town Hall Meeting
Series 2, Texas A&M University Libraries LibGuides Town Hall Meetinglmrey_tamul
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerEric Stephan
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 

Similaire à Data-driven Joint Debugging of the DBpedia Mappings and Ontology (20)

What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
 
DBpedia Ontology and Mapping Problems
DBpedia Ontology and Mapping ProblemsDBpedia Ontology and Mapping Problems
DBpedia Ontology and Mapping Problems
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
 
discussion_3_project.pdf
discussion_3_project.pdfdiscussion_3_project.pdf
discussion_3_project.pdf
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journey
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journey
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
DBT04-ER-Models-v1.pdf
DBT04-ER-Models-v1.pdfDBT04-ER-Models-v1.pdf
DBT04-ER-Models-v1.pdf
 
Introducing RDA: June 2013
Introducing RDA: June 2013Introducing RDA: June 2013
Introducing RDA: June 2013
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Chapter 1 Short Slide.pdf
Chapter 1 Short Slide.pdfChapter 1 Short Slide.pdf
Chapter 1 Short Slide.pdf
 
Series 2, Texas A&M University Libraries LibGuides Town Hall Meeting
Series 2, Texas A&M University Libraries LibGuides Town Hall MeetingSeries 2, Texas A&M University Libraries LibGuides Town Hall Meeting
Series 2, Texas A&M University Libraries LibGuides Town Hall Meeting
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model Manager
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 

Plus de Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataHeiko Paulheim
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge DiscoveryHeiko Paulheim
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Heiko Paulheim
 
Extending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesExtending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesHeiko Paulheim
 

Plus de Heiko Paulheim (12)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
Extending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesExtending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List Pages
 

Dernier

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 

Dernier (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 

Data-driven Joint Debugging of the DBpedia Mappings and Ontology

  • 1. 06/01/17 Heiko Paulheim 1 Data-driven Joint Debugging of the DBpedia Mappings and Ontology Towards Addressing the Causes instead of the Symptoms of Data Quality in DBpedia Heiko Paulheim
  • 2. 06/01/17 Heiko Paulheim 2 Motivation • Various works on finding errors in Knowledge Graphs – 2017 survey: 17 approaches – 15/17 are evaluated on DBpedia • Question: – How does DBpedia benefit from those works? ￘ H. Paulheim: Knowledge Graph Refinement – A Survey of Approaches and Evaluation Methods. SWJ 8(3), 2017
  • 3. 06/01/17 Heiko Paulheim 3 Motivation • What comes out of those research works – A list of (possibly) wrong statements – Source code for finding erroneous statements – ...
  • 4. 06/01/17 Heiko Paulheim 4 Motivation • Possible option 1: Remove erroneous triples from DBpedia • Challenges – May remove correct axioms, may need thresholding – Needs to be repeated for each release – Needs to be materialized on all of DBpedia DBpedia Extraction FrameworkWikipedia DBpedia Mappings Wiki Post Filter
  • 5. 06/01/17 Heiko Paulheim 5 Motivation • Materialized on full DBpedia: 8/15 approaches
  • 6. 06/01/17 Heiko Paulheim 6 Motivation • Possible option 2: Integrate into DBpedia Extraction Framework • Challenges – Development workload – Some approaches are not fully automated (technically or conceptually) – Scalability DBpedia Extraction Framework plus filter module Wikipedia DBpedia Mappings Wiki
  • 7. 06/01/17 Heiko Paulheim 7 Motivation • Scalability analyzed: 6/15 Disclaimer: does not imply that it is actually scalable!
  • 8. 06/01/17 Heiko Paulheim 8 Motivation • Do we have a third option? – Paulheim & Gangemi (2015): >95% of all inconsistencies in DBpedia boil down to 40 common root causes Wikipedia DBpedia Mappings Wiki DBpedia Extraction Framework Inconsistency DetectionIdentification of suspicious mappings and ontology constructs H. Paulheim, A. Gangemi: Serving DBpedia with DOLCE – More than Just Adding a Cherry on Top (ISWC 2015) Disclaimer: not equivalent to “wrong statements”
  • 9. 06/01/17 Heiko Paulheim 9 Approach dbr:San_Diego_ County,_California dbr:Agua_Caliente_ Airport dbo:operator foaf:name dbo:Airport dbo:Settlement dbo:Place dbo:Infrastructure dbo:Architectural- Structure dbo:Agent owl:disjoint With rdf:type rdf:type “Agua Caliente Airport” dbo:PopulatedPlace dbo:Organisation rdfs:range Obama free Example!
  • 10. 06/01/17 Heiko Paulheim 10 Approach • Find inconsistencies in extracted statements – Using DBpedia and DOLCE as top level ontology • Trace them back to mappings – In the example, there are three candidates • Property mapping to the predicate dbo:operator • Class mapping (subject) to dbo:Airport • Class mapping (object) to dbo:Settlement • Unfortunately, provenance information for DBpedia is not that fine-grained – i.e., we do not know which mapping was responsible for which statement in the end – first step: heuristic reconstruction
  • 11. 06/01/17 Heiko Paulheim 11 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) Wikipedia Page DBpedia Resource • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  • 12. 06/01/17 Heiko Paulheim 12 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) DBpedia Ontology Class • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  • 13. 06/01/17 Heiko Paulheim 13 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) DBpedia Ontology Property • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  • 14. 06/01/17 Heiko Paulheim 14 Approach (ctd.) • After we heuristically reconstructed the mappings, we can determine – How often is a mapping element involved in an inconsistency? – How often is a mapping element used, but not involved in an inconsistency?
  • 15. 06/01/17 Heiko Paulheim 15 Approach (ctd.) • Using the two counters cm and im , we can compute two scores for the hypothesis that m is problematic • Borrowed from Association Rule Mining (support and confidence): • N is the total number of statements in DBpedia
  • 16. 06/01/17 Heiko Paulheim 16 Identifying Interesting Problems • Hypothesis: high support and high confidence mapping elements hint at problems worth investigating – High support: fixing the issue would fix a lot of individual statements – High confidence: this mapping element actually hints at the root cause • i.e., fixing this does not break many other things • Unfortunately, both come at different scales – Difficult to use average, harmonic mean or the like – Support: μ = 0.0002, σ = 0.003 – Confidence: μ = 0.114, σ = 0.260 • Fix: use logarithmic support instead – LogSupport: μ = 0.179, σ = 0.139
  • 17. 06/01/17 Heiko Paulheim 17 Identifying Interesting Problems (ctd.) • Inspect mappings that have a high harmonic mean of confidence and log support 0.25 0.5 0.75 more interesting
  • 18. 06/01/17 Heiko Paulheim 18 Example Findings • Case 1: Mapping to wrong property • Example: – branch in infobox military unit is mapped to dbo:militaryBranch • but dbo:militaryBranch has dbo:Person as its domain – correction: dbo:commandStructure – Overall score: 0.721 – Affects 12,172 statements (31% of all dbo:militaryBranch)
  • 19. 06/01/17 Heiko Paulheim 19 Example Findings • Case 2: Mappings that should be removed • Example: – dbo:picture – Most of the are inconsistent (64.5% places, 23.0% persons) – Reason: statements are extracted from picture caption dbo:Brixton_Academy dbo:picture dbo:Brixton . dbo:Justify_My_Love dbo:picture dbo:Madonna_(entertainer) .
  • 20. 06/01/17 Heiko Paulheim 20 Example Findings • Case 3: Ontology problems (domain/range) • Example 1: – Populated places (e.g., cities) are used both as place and organization – For some properties, the range is either one of the two • e.g., dbo:operator (see introductory example) – Polysemy should be reflected in the ontology • Example 2: – dbo:architect, dbo:designer, dbo:engineer etc. have dbo:Person as their range – Significant fractions (8.6%, 7.6%, 58.4%, resp.) have a dbo:Organization as object – Range should be broadened
  • 21. 06/01/17 Heiko Paulheim 21 Example Findings • Case 4: Missing properties • Example 1: – dbo:president links an organization to its president – Majority use (8,354, or 76.2%): link a person to the president s/he served for • Example 2: – dbo:instrument links an artist to the instrument s/he plays – Prominent alternative use (3,828, or 7.2%): links a genre to its characteristic instrument Obamaexamplealert!
  • 22. 06/01/17 Heiko Paulheim 22 Future Work • Classify ontology, mapping, and other errors automatically – Currently ongoing: using different language editions of DBpedia • Heuristic: – problem present in many languages → ontology problem – Problem present only in one language → mapping problem • From post-processing to live processing – e.g., on-the-fly validation in DBpedia Mappings Wiki
  • 23. 06/01/17 Heiko Paulheim 23 Take Aways • Fixing bugs in knowledge graphs is nice – But often a one-time solution – Preserving the efforts is hard • Proposed solution – Identify and address the root problem – Scoring mechanism helps identifying interesting problems – Preserving the efforts by eliminating the root causes • Provenance matters! – The more we know about how a statement gets into a knowledge graph – The better can we automate the error analysis
  • 24. 06/01/17 Heiko Paulheim 24 Data-driven Joint Debugging of the DBpedia Mappings and Ontology Towards Addressing the Causes instead of the Symptoms of Data Quality in DBpedia Heiko Paulheim