SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
KNOWLEDGE GRAPHS
AND THE ROLE OF DBPEDIA
Paul Groth @pgroth
pgroth.com
Thanks to Joao Moura
Elsevier Labs @elsevierlabs
6th DBpedia Community Meeting in The Hague 2016
Feb. 12, 2016
FAVORITE DBPEDIA PREDICATE….
OUTLINE
• The Importance of Structure
• Better taxonomies
• Knowledge graph construction
ELSEVIER LABS - INTRO
WORLD LEADER IN DIGITAL INFO SOLUTIONS
4
Published over
330,000 articles
in 2013
Founded over
130 years ago
Work with over
30 million
Scientists, students, health
& information professionals
Employ over
7,000 employees
in 24 countries
Received over
1 million submissions
in 2013
Over the last
50 years
the majority of Noble
Laureates have published
with Elsevier
Over 53 million
items indexed by
Scopus
Elsevier eBooks, Online
Journals, Databases
Publishes over
2,200 online
journals & over
10,000 e-books
SOLUTIONS
Elsevier
R+D Solutions
Elsevier
Clinical Solutions
Helps corporate
researchers, R+D
professionals, and
engineers improve how
they interact with, share,
and apply information to
solve problems using
our digital workflow
tools, analytics, and data
Provides universities,
governments, and
research institutions with
the resources and
insights to improve
institutional research
strategy, management,
and performance.
Elsevier
Education
Helps medical
professionals apply
trusted data and
sophisticated tools to
make better clinical
decisions, deliver better
care, and produce
better healthcare
outcomes.
Helps educate
highly-skilled,
effective healthcare
professionals, using
the most advanced
pedagogical tools
and reference
works.
Elsevier
Research Intelligence
CONTENT
CAPABILITIESPLATFORMS
60 % OF TIME IS SPENT ON DATA
PREPARATION
STRUCTURED DATA
STRUCTURED DATA
CONNECTING DATA TO APPS
BUILDING BETTER TAXONOMIES
• Ontologies and taxonomies help organize and query content
• Annotation
• Classification / Navigation
• Autocomplete
• Suggestion & Recommendation
• We have lots of taxonomies/ontologies
• Journal Classification for Scopus
• Mendeley classification system
• Science Direct Subject classification
• Reference Modules Hierarchies for Books
• Submission system Journal classifications
• …
• Connect to external ontologies (e.g. MESH)
• Ontology Maintenance, Usage and Mapping
Knowledge Graph Construction and the Role of DBPedia
TAXONOMY INDUCTION
Starting with a very shallow hierarchy of syntactical concepts with almost no intersections:
1. Matching concepts against a target (well accepted) taxonomy and dbpedia:
• Problems: Same concept may have different names or terminologies in different
branches; Multiple languages etc.
2. Check for partial orders between these concepts, using the hierarchy of the target
taxonomy and dbpedia (skos:broader).
3. Finding/completing missing links between concepts.
Example Given two concepts, check if they form a parent-child relation:
select distinct * where{
<http://dbpedia.org/resource/Model-checking>
dbo:wikiPageRedirect* ?conceptChild.
?conceptChild dbo:wikiPageRedirects* ?redirectedChild.
?redirectedChild dct:subject ?subjectChild.
<http://dbpedia.org/resource/Formal_methods>
dbo:wikiPageRedirect* ?conceptParent.
?conceptParent dbo:wikiPageRedirects* ?redirectedParent.
?redirectedParent dct:subject ?subjectParent.
?subjectChild skos:broader ?subjectChildsParent
Filter(?subjectChildsParent = ?subjectParent)
}
Content
Universal
schema
Surface form
relations
Structured
relations
Factorization
model
Matrix
Construction
Open
Information
Extraction
Entity
Resolution
Matrix
Factorization
Knowledge
graph
Curation
Predicted
relations
Matrix
Completion
Taxonomy
Triple
Extraction
TOWARDS AN ELSEVIER KNOWLEDGE GRAPH
• Ongoing proof-of-concept work by Paul Groth, Sujit Pal and Ron Daniel of Elsevier Labs
• Unsupervised, scalable and built with off-the-shelf technologies
• Based on recent work at University College London and University of Massachusetts Amherst
• Riedel, Sebastian, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. "Relation extraction with matrix factorization and universal
schemas." (2013).
14M articles from
Science Direct
3.3M triples
475M triples
49M triples p x r matrix p x k, k x r latent factor
matrices
~102 triples
920K concepts
from EMMeT
ENTITY RESOLUTION: GLAUCOMA
Surface form triples downsampled from 49M entity-resolved triples
ANNOTATION
• http://www.slideshare.net/SparkSummit/dictionary-based-annotation-at-scale-with-spark-by-sujit-pal
• What is the problem?
• Annotate millions of documents from different corpora.
• 14M docs from Science Direct alone.
• More from other corpora, dependency parsing, etc.
• Critical step for Machine Reading and Knowledge Graph applications.
• Why is this such a big deal?
• Takes advantage of existing linked data.
• No model training for multiple complex STM domains.
• However, simple until done at scale.
ANNOTATION PIPELINE
DICTIONARY BASED NE ANNOTATOR (SODA)
DICTIONARY BASED NE ANNOTATOR (SODA)
• Part of Document Annotation Pipeline.
• Annotates text with Named Entities from external Dictionaries.
• Why do we have to scale (Wikipedia KBs) – 8 Million entities
• Built with Open Source Components
• Apache Solr – Highly reliable, scalable and fault-tolerant search index.
• SolrTextTagger – Solr component for text tagging, uses Lucene FST technology.
• Apache OpenNLP – Machine Learning based toolkit for processing Natural Language Text.
• Apache Spark – Lightning fast, large scale data processing.
• Uses ideas from other Open Source libraries
• FuzzyWuzzy – Fuzzy String Matching like a boss.
• Contributed back to Open Source
• https://github.com/elsevierlabs-os/soda
Content
Universal
schema
Surface form
relations
Structured
relations
Factorization
model
Matrix
Construction
Open
Information
Extraction
Entity
Resolution
Matrix
Factorization
Knowledge
graph
Curation
Predicted
relations
Matrix
Completion
Taxonomy
Triple
Extraction
TOWARDS AN ELSEVIER KNOWLEDGE GRAPH
14M articles from
Science Direct
3.3M triples
475M triples
49M triples p x r matrix p x k, k x r latent factor
matrices
~102 triples
920K concepts
from EMMeT
MATRIX CONSTRUCTION: GLAUCOMA
p=83
r = 176
83 x 176 sparse binary-valued matrix
with 366 entries
surface form
relations
structured
relations
entitypairs
MATRIX COMPLETION: GLAUCOMA
Latent factor matrix
r = 176
p=83
Latentfactormatrix
×
83 x 176 real-valued matrix with
14,608 entries
=
PREDICTED RELATIONS: GLAUCOMA
• At threshold = 0.08
• 22 unseen relations
• F1 = 0.71
• Applications beyond
knowledge graph construction
• Taxonomy and ontology
maintenance
• Entity search in task-
specific and/or mobile
context
• Question answering
glaucoma developed many years after chronic inflammation of uveal tract
glaucoma develop following chronic inflammation of uveal tract
glaucoma can appear soon in family history of glaucoma
glaucoma can appear soon in age over 40
glaucoma the risk of functional visual field loss
glaucoma contributing causes of functional visual field loss
glaucoma contributed to functional visual field loss
glaucoma is considered the second leading cause of functional visual field loss
glaucoma remains the second leading cause of functional visual field loss
This is a
unique
entity not a
string
A DBPEDIA IDEA?
• Connect to the Scholarly Ecosystem
• Crossref & Data Cite DOIs + ORCIDS
CONCLUSION
• DBPedia and Wikipedia KBs are great reference sources
• Beyond expected use for…
• Internal knowledge curation
• Stress testing
• We’re hiring 

Contenu connexe

Tendances

Recommender systems and information extraction for researchers
Recommender systems and information extraction for researchersRecommender systems and information extraction for researchers
Recommender systems and information extraction for researchersMarco Rossetti
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policiesNikesh Narayanan
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009Kevin Ashley
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planC. Tobin Magle
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open ScienceBeth Plale
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse CommonsMerce Crosas
 
THOR Workshop - Introduction
THOR Workshop - IntroductionTHOR Workshop - Introduction
THOR Workshop - IntroductionMaaike Duine
 
Research methodology
Research methodologyResearch methodology
Research methodologyCutLiaisons
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsAlbert Meroño-Peñuela
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemMichael Bar-Sinai
 
THOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEATHOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEAMaaike Duine
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food DomainRothamsted Research, UK
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 

Tendances (20)

Recommender systems and information extraction for researchers
Recommender systems and information extraction for researchersRecommender systems and information extraction for researchers
Recommender systems and information extraction for researchers
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policies
 
Meadows apr28-1
Meadows apr28-1Meadows apr28-1
Meadows apr28-1
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open Science
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
THOR Workshop - Introduction
THOR Workshop - IntroductionTHOR Workshop - Introduction
THOR Workshop - Introduction
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIs
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
 
THOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEATHOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEA
 
CST3590 Nov 2021
CST3590 Nov 2021 CST3590 Nov 2021
CST3590 Nov 2021
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 

Similaire à Knowledge Graph Construction and the Role of DBPedia

20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSemantic Web Company
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014KDZ - Zentrum für Verwaltungsforschung
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 

Similaire à Knowledge Graph Construction and the Role of DBPedia (20)

20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 

Plus de Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-cziPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 

Plus de Paul Groth (20)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 

Dernier

KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 

Dernier (20)

KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 

Knowledge Graph Construction and the Role of DBPedia

  • 1. KNOWLEDGE GRAPHS AND THE ROLE OF DBPEDIA Paul Groth @pgroth pgroth.com Thanks to Joao Moura Elsevier Labs @elsevierlabs 6th DBpedia Community Meeting in The Hague 2016 Feb. 12, 2016
  • 3. OUTLINE • The Importance of Structure • Better taxonomies • Knowledge graph construction
  • 4. ELSEVIER LABS - INTRO WORLD LEADER IN DIGITAL INFO SOLUTIONS 4 Published over 330,000 articles in 2013 Founded over 130 years ago Work with over 30 million Scientists, students, health & information professionals Employ over 7,000 employees in 24 countries Received over 1 million submissions in 2013 Over the last 50 years the majority of Noble Laureates have published with Elsevier Over 53 million items indexed by Scopus Elsevier eBooks, Online Journals, Databases Publishes over 2,200 online journals & over 10,000 e-books SOLUTIONS Elsevier R+D Solutions Elsevier Clinical Solutions Helps corporate researchers, R+D professionals, and engineers improve how they interact with, share, and apply information to solve problems using our digital workflow tools, analytics, and data Provides universities, governments, and research institutions with the resources and insights to improve institutional research strategy, management, and performance. Elsevier Education Helps medical professionals apply trusted data and sophisticated tools to make better clinical decisions, deliver better care, and produce better healthcare outcomes. Helps educate highly-skilled, effective healthcare professionals, using the most advanced pedagogical tools and reference works. Elsevier Research Intelligence CONTENT CAPABILITIESPLATFORMS
  • 5. 60 % OF TIME IS SPENT ON DATA PREPARATION
  • 9. BUILDING BETTER TAXONOMIES • Ontologies and taxonomies help organize and query content • Annotation • Classification / Navigation • Autocomplete • Suggestion & Recommendation • We have lots of taxonomies/ontologies • Journal Classification for Scopus • Mendeley classification system • Science Direct Subject classification • Reference Modules Hierarchies for Books • Submission system Journal classifications • … • Connect to external ontologies (e.g. MESH) • Ontology Maintenance, Usage and Mapping
  • 11. TAXONOMY INDUCTION Starting with a very shallow hierarchy of syntactical concepts with almost no intersections: 1. Matching concepts against a target (well accepted) taxonomy and dbpedia: • Problems: Same concept may have different names or terminologies in different branches; Multiple languages etc. 2. Check for partial orders between these concepts, using the hierarchy of the target taxonomy and dbpedia (skos:broader). 3. Finding/completing missing links between concepts.
  • 12. Example Given two concepts, check if they form a parent-child relation: select distinct * where{ <http://dbpedia.org/resource/Model-checking> dbo:wikiPageRedirect* ?conceptChild. ?conceptChild dbo:wikiPageRedirects* ?redirectedChild. ?redirectedChild dct:subject ?subjectChild. <http://dbpedia.org/resource/Formal_methods> dbo:wikiPageRedirect* ?conceptParent. ?conceptParent dbo:wikiPageRedirects* ?redirectedParent. ?redirectedParent dct:subject ?subjectParent. ?subjectChild skos:broader ?subjectChildsParent Filter(?subjectChildsParent = ?subjectParent) }
  • 13. Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction TOWARDS AN ELSEVIER KNOWLEDGE GRAPH • Ongoing proof-of-concept work by Paul Groth, Sujit Pal and Ron Daniel of Elsevier Labs • Unsupervised, scalable and built with off-the-shelf technologies • Based on recent work at University College London and University of Massachusetts Amherst • Riedel, Sebastian, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. "Relation extraction with matrix factorization and universal schemas." (2013). 14M articles from Science Direct 3.3M triples 475M triples 49M triples p x r matrix p x k, k x r latent factor matrices ~102 triples 920K concepts from EMMeT
  • 14. ENTITY RESOLUTION: GLAUCOMA Surface form triples downsampled from 49M entity-resolved triples
  • 15. ANNOTATION • http://www.slideshare.net/SparkSummit/dictionary-based-annotation-at-scale-with-spark-by-sujit-pal • What is the problem? • Annotate millions of documents from different corpora. • 14M docs from Science Direct alone. • More from other corpora, dependency parsing, etc. • Critical step for Machine Reading and Knowledge Graph applications. • Why is this such a big deal? • Takes advantage of existing linked data. • No model training for multiple complex STM domains. • However, simple until done at scale.
  • 17. DICTIONARY BASED NE ANNOTATOR (SODA) DICTIONARY BASED NE ANNOTATOR (SODA) • Part of Document Annotation Pipeline. • Annotates text with Named Entities from external Dictionaries. • Why do we have to scale (Wikipedia KBs) – 8 Million entities • Built with Open Source Components • Apache Solr – Highly reliable, scalable and fault-tolerant search index. • SolrTextTagger – Solr component for text tagging, uses Lucene FST technology. • Apache OpenNLP – Machine Learning based toolkit for processing Natural Language Text. • Apache Spark – Lightning fast, large scale data processing. • Uses ideas from other Open Source libraries • FuzzyWuzzy – Fuzzy String Matching like a boss. • Contributed back to Open Source • https://github.com/elsevierlabs-os/soda
  • 18. Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction TOWARDS AN ELSEVIER KNOWLEDGE GRAPH 14M articles from Science Direct 3.3M triples 475M triples 49M triples p x r matrix p x k, k x r latent factor matrices ~102 triples 920K concepts from EMMeT
  • 19. MATRIX CONSTRUCTION: GLAUCOMA p=83 r = 176 83 x 176 sparse binary-valued matrix with 366 entries surface form relations structured relations entitypairs
  • 20. MATRIX COMPLETION: GLAUCOMA Latent factor matrix r = 176 p=83 Latentfactormatrix × 83 x 176 real-valued matrix with 14,608 entries =
  • 21. PREDICTED RELATIONS: GLAUCOMA • At threshold = 0.08 • 22 unseen relations • F1 = 0.71 • Applications beyond knowledge graph construction • Taxonomy and ontology maintenance • Entity search in task- specific and/or mobile context • Question answering glaucoma developed many years after chronic inflammation of uveal tract glaucoma develop following chronic inflammation of uveal tract glaucoma can appear soon in family history of glaucoma glaucoma can appear soon in age over 40 glaucoma the risk of functional visual field loss glaucoma contributing causes of functional visual field loss glaucoma contributed to functional visual field loss glaucoma is considered the second leading cause of functional visual field loss glaucoma remains the second leading cause of functional visual field loss This is a unique entity not a string
  • 22. A DBPEDIA IDEA? • Connect to the Scholarly Ecosystem • Crossref & Data Cite DOIs + ORCIDS
  • 23. CONCLUSION • DBPedia and Wikipedia KBs are great reference sources • Beyond expected use for… • Internal knowledge curation • Stress testing • We’re hiring 

Notes de l'éditeur

  1. http://wiki.dbpedia.org/meetings/TheHague2016
  2. More context…
  3. Elsevier Labs in context
  4. NASA, A.40 Computational Modeling Algorithms and Cyberinfrastructure, tech. report, NASA, 19 Dec. 2011
  5. “Mendeley Suggest” is our personalised article recommender. It is based on what users have in their libraries, and recommends other related articles.  Uses taxonomies
  6. Can we do structuring automatically?