SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Bottom-up Discovery of Context-aware Quality Constraints
for Heterogeneous Knowledge Graphs
Xander Wilcke1
, Maurice de Kleijn2
, Victor de Boer1
,
Henk Scholten2
, Frank van Harmelen1
1. Dept. of Computer Science
2. Dept. of Spatial Economics
Vrije Universiteit Amsterdam, The Netherlands
KDIR 2020
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 2 / 29
Overview
1. Quality control - why context matters
2. Defining context-aware constraints
3. Discovering context-aware constraints
4. A two-fold evaluation, from
 an algorithmic perspective, and
 a user perspective
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 3 / 29
Knowledge as a graph
●
Knowledge Graphs are getting increasingly adopted
– Institutes, museums, tech giants, businesses, …
– Knowledge quality is no longer optional
How to maintain the quality of the knowledge across its entire life cycle?
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 4 / 29
Quality Control
●
A key component is the quality constraint
– Helps guard the consistency, accuracy, precision, etc.
– Constraint languages for knowledge graphs
●
SHACL
●
ShEx
Figure from “Jose E. Labra Gayo et al. (2018) Validating RDF Data, Synthesis Lectures on the
Semantic Web: Theory and Technology, Vol. 7, No. 1, 1-328”
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 5 / 29
Quality Control for Knowledge Graphs
●
Existing constraint languages work on the schema level
e.g.
– All nodes of a certain type:
– All source / destination nodes of a certain relation:
Bridge
type
material
??
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 6 / 29
Quality Control for Knowledge Graphs
●
Existing constraint languages work on the schema level
e.g.
– All nodes of a certain type:
– All source / destination nodes of a certain relation:
Bridge
type
Knowledge graphs allow for context-level constraints
material
??
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 7 / 29
Contextual Clusters
Context Unaware Context Aware
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 8 / 29
Contextual Clusters
Context Unaware Context Aware
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 9 / 29
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Example
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 10 / 29
Example
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 11 / 29
Contributions
1. We introduce context-aware constraints, which
●
offer a more fine-grained control of the domains onto which to impose
restrictions
●
apply to domains defined by graph motifs (contextual pattern)
●
allow for multimodal pattern fragments (numbers, dates, texts, ...)
2. We also introduce a (embarrassingly parallel) bottom-up anytime algorithm to
discover context-aware constraints in heterogeneous knowledge graphs
3. We evaluate 1 and 2 in a user study with experts in a real-world knowledge
validation use case
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 12 / 29
Knowledge Graphs
●
Graph-shaped knowledge bases
●
Assertions are encoded as edges between nodes
●
Nodes can be
– Entities: things, concepts, etc.
– Literals: strings, numbers, dates, etc.
●
Contexts gives entities their meaning
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 13 / 29
Defining Context-aware Constraints
A context-aware constraint states that every entity
which satisfies antecedent must also satisfy consequent
here, and
assertion patterns
Assertion pattern states that there exists
a relation between any sets of nodes that match
pattern variables and
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 14 / 29
Defining Context-aware Constraints
Pattern variables can express
●
Any specific node (entity or literal) /
●
All entities of a type t (object-type)
●
All literals of a datatype dt (data-type)
●
All literals which match a
regular expression s (value-type)
Assertion pattern states that there exists
a relation between any sets of nodes that match
pattern variables and
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 15 / 29
Examples
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 16 / 29
Discovering Context-aware Constraints
●
Algorithm properties
– Bottom-up: learns constraints directly from any knowledge graph
– Anytime property: longer runs yield constraints with more fine-grained
domains
– Embarrassingly parallel: newly discovered constraints form a new branch of
which the children can be computed independently
●
Algorithm assumptions
1) The large majority of the knowledge is valid and accurate, and
2) that these two qualities can be captured using frequent pattern mining
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 17 / 29
Discovering Context-aware Constraints
Main components
●
The generation forest
stores constraints in generation trees,
and keeps track of process per depth
●
The explore-extend loop
explores and tests increasingly-more
complex constraints
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 18 / 29
Discovering Context-aware Constraints
1. Generate all constraints of depth 0 that
exceed minimal support and confidence:
1)
2)
3)
4)
5)
6)
7)
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 19 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 20 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 21 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 22 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 23 / 29
Discovering Context-aware Constraints
Graph Perspective
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 24 / 29
Experiments & Evaluation
●
Algorithmic perspective
– Goal: to determine the trade-off between chosen support
and confidence, and the contraints they yield
– Form: grid search on 3 distinctly-different datasets with
support and confidence as parameters
●
User perspective
– Goal: to assess the effectiveness of our method to discover
constraints that are useful for quality control
– Form: a structured user evaluation with knowledge-management
experts in a real-world knowledge validation use case
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 25 / 29
Evaluation – Algorithmic Perspective
●
Strong positive correlation between the number of
discovered constraints and the chosen support and
confidence values (Table 3)
●
Number of relations (cf. dataset size) is likely the main
attributor to the number of discovered constraints
●
Possitive correlation between number of pruned and
discovered constraints suggests that the pruning
strategy is, to an extent, effective (Table 3)
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 26 / 29
Evaluation – User Perspective
●
Structured User Evaluation
– Workshop hosted at Rijkswaterstaat,
The Netherlands
– Domain of asset management and
civil engineering
– 21 participants, all experts on
knowledge maintenance and validation
– Asked to assess constraints on
usefullness and graininess. Constraints
are divided into 3x4 groups of increasing
complexity (unbeknownst to participants)
Rijkswaterstaat, the Netherlands
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 27 / 29
Evaluation – User Perspective
●
More than half of the participants thought the
complexity of the discovered constraints was well
balanced (Tabel 8)
●
There is little difference in scores between the three
complexity groups, suggesting no interaction or
too little difference between groups (Tabel 9)
●
Overall fair to moderate agreement on usefullness
between participants, but significant differences in
agreements between complexity groups (Tabel 9)
●
Neutral to agreeable stance with respect to the overall
usefulness of our method, but a considerable portion
was unsure, likely due to lack of familiarity with the
domain (Tabel 10)
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 28 / 29
Conclusion
●
Context-aware constraints are, to an extent, useful for knowledge
validation tasks, and, for the most part, well-balanced with respect to
complexity
●
No direct relationship between the dimensions of a graph and the
number of discovered constraints. This makes it difficult to apply a
rule of thumb to the support and confidence values
●
Scalability remains a practical challenge, but is partly alleviated by
our pruning and optimization strategies, and by parallelizing the task
●
Analysis of our algorithm’s time complexity fell out of the current
scope, and should be investigated in future work
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 29 / 29
Thank You
●
Slides available at tinyurl.com/yyzr5876
●
Code available at gitlab.com/wxwilcke/cckg
●
Data available at gitlab.com/wxwilcke/mmkg

Contenu connexe

Similaire à Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs

Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureSteffen Staab
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality AssurancePéter Király
 
Why ∆Q is the ideal network metric
Why ∆Q is the ideal network metricWhy ∆Q is the ideal network metric
Why ∆Q is the ideal network metricMartin Geddes
 
Modelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra BagnatoModelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra BagnatoAlessandra Bagnato
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataMarco Torchiano
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary
 
Metadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortMetadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortPéter Király
 
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...OECD Governance
 
Portsmouth University Presentation
Portsmouth University PresentationPortsmouth University Presentation
Portsmouth University PresentationStavros Thomas
 
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...Jean Vanderdonckt
 
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...Deltares
 
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Michael Dorner
 
factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsPéter Király
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez
 

Similaire à Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs (20)

Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Introduction to Metrology
Introduction to Metrology Introduction to Metrology
Introduction to Metrology
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality Assurance
 
Why ∆Q is the ideal network metric
Why ∆Q is the ideal network metricWhy ∆Q is the ideal network metric
Why ∆Q is the ideal network metric
 
Modelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra BagnatoModelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra Bagnato
 
Wcre12b.ppt
Wcre12b.pptWcre12b.ppt
Wcre12b.ppt
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open Data
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
Metadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortMetadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - short
 
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
 
Portsmouth University Presentation
Portsmouth University PresentationPortsmouth University Presentation
Portsmouth University Presentation
 
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
 
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
 
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation begins
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...
PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...
PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...
 

Dernier

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Dernier (20)

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs

  • 1. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs Xander Wilcke1 , Maurice de Kleijn2 , Victor de Boer1 , Henk Scholten2 , Frank van Harmelen1 1. Dept. of Computer Science 2. Dept. of Spatial Economics Vrije Universiteit Amsterdam, The Netherlands KDIR 2020
  • 2. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 2 / 29 Overview 1. Quality control - why context matters 2. Defining context-aware constraints 3. Discovering context-aware constraints 4. A two-fold evaluation, from  an algorithmic perspective, and  a user perspective
  • 3. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 3 / 29 Knowledge as a graph ● Knowledge Graphs are getting increasingly adopted – Institutes, museums, tech giants, businesses, … – Knowledge quality is no longer optional How to maintain the quality of the knowledge across its entire life cycle?
  • 4. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 4 / 29 Quality Control ● A key component is the quality constraint – Helps guard the consistency, accuracy, precision, etc. – Constraint languages for knowledge graphs ● SHACL ● ShEx Figure from “Jose E. Labra Gayo et al. (2018) Validating RDF Data, Synthesis Lectures on the Semantic Web: Theory and Technology, Vol. 7, No. 1, 1-328”
  • 5. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 5 / 29 Quality Control for Knowledge Graphs ● Existing constraint languages work on the schema level e.g. – All nodes of a certain type: – All source / destination nodes of a certain relation: Bridge type material ??
  • 6. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 6 / 29 Quality Control for Knowledge Graphs ● Existing constraint languages work on the schema level e.g. – All nodes of a certain type: – All source / destination nodes of a certain relation: Bridge type Knowledge graphs allow for context-level constraints material ??
  • 7. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 7 / 29 Contextual Clusters Context Unaware Context Aware
  • 8. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 8 / 29 Contextual Clusters Context Unaware Context Aware
  • 9. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 9 / 29 Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway Example Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway
  • 10. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 10 / 29 Example Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway
  • 11. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 11 / 29 Contributions 1. We introduce context-aware constraints, which ● offer a more fine-grained control of the domains onto which to impose restrictions ● apply to domains defined by graph motifs (contextual pattern) ● allow for multimodal pattern fragments (numbers, dates, texts, ...) 2. We also introduce a (embarrassingly parallel) bottom-up anytime algorithm to discover context-aware constraints in heterogeneous knowledge graphs 3. We evaluate 1 and 2 in a user study with experts in a real-world knowledge validation use case
  • 12. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 12 / 29 Knowledge Graphs ● Graph-shaped knowledge bases ● Assertions are encoded as edges between nodes ● Nodes can be – Entities: things, concepts, etc. – Literals: strings, numbers, dates, etc. ● Contexts gives entities their meaning Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway
  • 13. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 13 / 29 Defining Context-aware Constraints A context-aware constraint states that every entity which satisfies antecedent must also satisfy consequent here, and assertion patterns Assertion pattern states that there exists a relation between any sets of nodes that match pattern variables and
  • 14. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 14 / 29 Defining Context-aware Constraints Pattern variables can express ● Any specific node (entity or literal) / ● All entities of a type t (object-type) ● All literals of a datatype dt (data-type) ● All literals which match a regular expression s (value-type) Assertion pattern states that there exists a relation between any sets of nodes that match pattern variables and
  • 15. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 15 / 29 Examples
  • 16. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 16 / 29 Discovering Context-aware Constraints ● Algorithm properties – Bottom-up: learns constraints directly from any knowledge graph – Anytime property: longer runs yield constraints with more fine-grained domains – Embarrassingly parallel: newly discovered constraints form a new branch of which the children can be computed independently ● Algorithm assumptions 1) The large majority of the knowledge is valid and accurate, and 2) that these two qualities can be captured using frequent pattern mining
  • 17. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 17 / 29 Discovering Context-aware Constraints Main components ● The generation forest stores constraints in generation trees, and keeps track of process per depth ● The explore-extend loop explores and tests increasingly-more complex constraints
  • 18. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 18 / 29 Discovering Context-aware Constraints 1. Generate all constraints of depth 0 that exceed minimal support and confidence: 1) 2) 3) 4) 5) 6) 7)
  • 19. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 19 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 20. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 20 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 21. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 21 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 22. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 22 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 23. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 23 / 29 Discovering Context-aware Constraints Graph Perspective
  • 24. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 24 / 29 Experiments & Evaluation ● Algorithmic perspective – Goal: to determine the trade-off between chosen support and confidence, and the contraints they yield – Form: grid search on 3 distinctly-different datasets with support and confidence as parameters ● User perspective – Goal: to assess the effectiveness of our method to discover constraints that are useful for quality control – Form: a structured user evaluation with knowledge-management experts in a real-world knowledge validation use case
  • 25. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 25 / 29 Evaluation – Algorithmic Perspective ● Strong positive correlation between the number of discovered constraints and the chosen support and confidence values (Table 3) ● Number of relations (cf. dataset size) is likely the main attributor to the number of discovered constraints ● Possitive correlation between number of pruned and discovered constraints suggests that the pruning strategy is, to an extent, effective (Table 3)
  • 26. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 26 / 29 Evaluation – User Perspective ● Structured User Evaluation – Workshop hosted at Rijkswaterstaat, The Netherlands – Domain of asset management and civil engineering – 21 participants, all experts on knowledge maintenance and validation – Asked to assess constraints on usefullness and graininess. Constraints are divided into 3x4 groups of increasing complexity (unbeknownst to participants) Rijkswaterstaat, the Netherlands
  • 27. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 27 / 29 Evaluation – User Perspective ● More than half of the participants thought the complexity of the discovered constraints was well balanced (Tabel 8) ● There is little difference in scores between the three complexity groups, suggesting no interaction or too little difference between groups (Tabel 9) ● Overall fair to moderate agreement on usefullness between participants, but significant differences in agreements between complexity groups (Tabel 9) ● Neutral to agreeable stance with respect to the overall usefulness of our method, but a considerable portion was unsure, likely due to lack of familiarity with the domain (Tabel 10)
  • 28. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 28 / 29 Conclusion ● Context-aware constraints are, to an extent, useful for knowledge validation tasks, and, for the most part, well-balanced with respect to complexity ● No direct relationship between the dimensions of a graph and the number of discovered constraints. This makes it difficult to apply a rule of thumb to the support and confidence values ● Scalability remains a practical challenge, but is partly alleviated by our pruning and optimization strategies, and by parallelizing the task ● Analysis of our algorithm’s time complexity fell out of the current scope, and should be investigated in future work
  • 29. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 29 / 29 Thank You ● Slides available at tinyurl.com/yyzr5876 ● Code available at gitlab.com/wxwilcke/cckg ● Data available at gitlab.com/wxwilcke/mmkg