This is a thesis presentation about interlinking educational data to Web of Data. I explain how I used the Linked Data approach to expose and interlink educational data to the Linked Open Data cloud
Interlinking educational data to Web of Data (Thesis presentation)
1. International doctorate thesis
I li ki Ed i l dInterlinking Educational data to
Web of Data
Presented by: Enayat Rajabi
Supervisors: Salvador Sanchez-Alonso
Miguel-Ángel Sicilia
May 2015
2. Agendag
1 Research context1. Research context
2. Motivation
3 State of the art3. State of the art
4. General objective & approach
5. Specific objectives
6. Studies & experimentationsp
7. Conclusion & future work
2 out of 54
3. Research contextResearch context
Linked Data
An approach for e posing structured data An approach for exposing structured data
(triples) on the Web
Currently, the LOD cloud includes ~10,000y, ,
datasets (88 Billion triples) in different domains
Datasets include metadata about objects
I li ki l h l bli h li k Interlinking tools help publishers to link
datasets
12 datasets!
May 2007
1. Research context3 out of 54
4. Research contextResearch context
eLearning
eLearning repositories (including educational data) eLearning repositories (including educational data)
eLearning metadata schema (Dublin Core, IEEE
LOM,…)
Analysis on largest educational repository with
around one million metadata (GLOBE)
1. Research context4 out of 54
5. MotivationMotivation
An increasing number of educational resources
are published on the Webare published on the Web.
Some of these resources are implicitly or
semantically related to each other.semantically related to each other.
The Linked Data approach allows resources to be
reusable, and accessible for learners.
There exist a number of tools for exposing data
and semi-automatic linking between datasets.
2. Motivation5 out of 54
6. State of the art (background)State of the art (background)
• Is there any mapping• Is the (meta)data
Practical steps for exposing data as Linked Data
y pp g
tool to convert data?
( )
structure flat or
hierarchical?
Selecting a
proper
Converting
d t i tproper
schema or
ontology
data into a
structured
format
Mapping
data
according
to the
Importing
the RDF
dump to a
triple store
• Creating a dump file
• Selecting a proper
triple store
• Setting up a SPARQL
ontology
triple store
Setting up a SPARQL
endpoint
3. State of the art6 out of 54
8. State of the artState of the art
Interlinking educational data
Studying the importance of interlinking on an
educational context (Stefan Dietz, 2012)
Exposing IEEE LOM as RDF
RDF binding of some IEEE LOM elements
(Nil & P l é 2002)(Nilson & Palmér, 2002)
Interlinking tools
Theoretical comparison of interlinking tools
(Wolger et al., 2011)
3. State of the art8 out of 54
11. Specific objectivesSpecific objectives
1 Analyzing an eLearning metadata schema1. Analyzing an eLearning metadata schema
for exposing it as Linked Open Data
2. Examining the datasets in the Linked2. Examining the datasets in the Linked
Open Data cloud
3. Investigating existing interlinking tools ing g g g
an educational context
4. Assessing the interlinking results andg g
their advantages
5. Specific objectives11 out of 54
12. Objective 1:
Analyzing a metadata schema for
exposing it as Linked Open Data
6. Studies and experimentations12 out of 54
13. Exposing a flat schemaExposing a flat schema
DCT titlDCTerms:title
DCTerms:date
DCTerms:publisher
Mapping an RDB to Dublin Core
6. Studies and experimentations13 out of 54
14. Exposing a complex schema (IEEE LOM)Exposing a complex schema (IEEE LOM)
6. Studies and experimentations14 out of 54
15. IEEE LOM ontologyIEEE LOM ontology
IEEE LOM schema has a hierarchical structure
and it supports different kinds of data types, sopp yp ,
we had to:
Map the data types
Specif a correct element for identifier (URI) Specify a correct element for identifier (URI)
Choose a strategy for exposing
aggregated elements (e.g., keyword)
Reuse existing vocabularies
Test the ontology in an
implementationimplementation
6. Studies and experimentations15 out of 54
16. A case study based on the ontologyy gy
6. Studies and experimentations16 out of 54
17. Remarks of this investigationRemarks of this investigation
Analyzing the IEEE LOM schema for the sake of:
exposing its elements as Linked Open Data
creating an complete ontology
identifying the appropriate elements for interlinking identifying the appropriate elements for interlinking
The exposing approach was applied for other schemas
as well.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso,
“I li ki Ed i l R W b f D h h“Interlinking Educational Resources to Web of Data through
IEEE LOM”. Computer Science and Information Systems,
vol. 12, No. 1, pp. 233–255, 2015.
6. Studies and experimentations17 out of 54
19. The LOD datasets analysisThe LOD datasets analysis
We analyzed the Linked Open Data cloud
to realize:
1. what datasets are more important in the cloud
to be linked in an educational domain?to be linked in an educational domain?
Examining the LOD cloud using Social Network
Analysis (SNA)
2. what educational datasets are appropriate for
interlinking?
Selecting a set of educational datasets in Selecting a set of educational datasets in
datahub using some metrics
6. Studies and experimentations19 out of 54
20. The LOD datasets analysisThe LOD datasets analysis
We considered the LOD cloud as a directed graph and
analyzed them according the following SNA metrics:analyzed them according the following SNA metrics:
Betweenness Centrality (BC): If a dataset has a high BC value, then
many datasets are connected through it to others.
In-Degree: the number of datasets point to the current dataset
D I D O D B C li
g p
Out-Degree: the number of datasets that to the current dataset
point to
Dataset In-Degree Out-Degree Betweenness Centrality
DBpedia 181 30 82,664
Geonames 55 0 10,958
DrugBank 8 12 7,446
Bio2rdf-goa 11 8 3,751
Ordance-survey 16 0 3 272Ordance survey 16 0 3,272
6. Studies and experimentations20 out of 54
21. The LOD datasets analysisThe LOD datasets analysis
High BCHigh BC
6. Studies and experimentations21 out of 54
22. Selecting educational datasetsSelecting educational datasets
Exploring the LOD cloud to find educational
d t t i th f ll i tdataset using the following steps:
Finding the datasets in datahub tagged as
educational subjectseducational subjects
Checking their SPARQL endpoints or RDF
dumps’ availability
Retrieving their specification (size, metadata
schema, language…) from an interlinking
point of view using SPARQLpoint of view using SPARQL
6. Studies and experimentations22 out of 54
24. Educational datasets bubble graphEducational datasets bubble graph
Selecting 20 available educational datasets
6. Studies and experimentations24 out of 54
25. Getting datasets specification using SPARQLGetting datasets specification using SPARQL
Datasets Size (triple) SPARQL Endpoint
Charles University in Prague 93,233,661 http://linked.opendata.cz/sparql
UNISTAT-KIS 8,026,637 http://data.linkedu.eu/kis/query
h d d k ( ) 7 h // l d / lAchievement Standards Network (ASN) 7,494,201 http://sparql.jesandco.org:8890/sparql
Data.gov.uk 6,619,847 http://services.data.gov.uk/education/sparql
University of Southampton 5,726,668 http://sparql.data.southampton.ac.uk/
Yovisto - academic video search 4,932,352 http://sparql.yovisto.com/
University of Muenster(LODUM) 4,179,372 http://data.uni-muenster.de/sparql/
O U i it i UK 3 588 626 htt //d t k/ lOpen University in UK 3,588,626 http://data.open.ac.uk/sparql
University of Huddersfield 3,553,343 http://data.linkedu.eu/hud/query
Semantic ISVU (Kent) 2,421,268 http://kent.zpr.fer.hr:8080/educationalProgram
/sparql
University of Bristol 1,885,124 http://resrev.ilrt.bris.ac.uk/data-server-
workshop/sparql
Aalto University 1 589 122 http://data aalto fi/sparqlAalto University 1,589,122 http://data.aalto.fi/sparql
Open Courseware Consortium metadata 636,453 http://data.linkedu.eu/ocw/query
OxPoints (University of Oxford) 318,392 https://data.ox.ac.uk/sparql/
TheSoz Thesaurus for the Social Sciences
(GESIS) 305,329 http://lod.gesis.org/thesoz/sparql
PROD 62,375 http://data.linkedu.eu/prod/query
h // d d i 2 i /LMF/ l/Open Data @ Tor Vergata 54,968 http://opendata.ccd.uniroma2.it/LMF/sparql/se
lect
Vytautas Magnus University, Kaunas 39,279 http://kaunas.rkbexplorer.com/sparql/
MoreLab 3,906 http://www.morelab.deusto.es/joseki/articles
Forge project 132 http://data.linkedu.eu/forge/query
6. Studies and experimentations25 out of 54
26. Getting entities from the datasetsGetting entities from the datasets
Open University of UK endpoint
6. Studies and experimentations26 out of 54
27. Remarks of this investigation
Selecting the DBpedia dataset as the LOD hub for
i t li ki d ti l d t t
Remarks of this investigation
interlinking educational datasets
Identifying a set of well-formed educational
datasets for interlinkingdatasets for interlinking
E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing
Broken Links on the Web of Data: an Experiment with DBpedia,”
Journal of the Association for Information Science and
Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering
Duplicate and Related Resources using Interlinking Approach: The
case of Educational Datasets,” Journal of Information Science, first
published on March 10 2015published on March 10, 2015
6. Studies and experimentations27 out of 54
29. Interlinking tools (comparison)Interlinking tools (comparison)
Tool Domain
SPARQL/ RDF
Dump
Manual/
Automatic
Well-
documented
Customization
flexibility
GWAP Multimedia No Manual No Unknown
LIMES LOD Y A i Y YLIMES LOD Yes Automatic Yes Yes
LOD Refine General Yes Automatic Yes Partially
RDF-IA LOD RDF Dump Automatic No Unknown
SAI Multimedia No Automatic No Unknown
Silk LOD Yes Automatic Yes Yes
UCI LOD Y M l N U kUCI LOD Yes Manual No Unknown
6. Studies and experimentations29 out of 54
30. Interlinking tools (general idea)Interlinking tools (general idea)
Source
• Source data type: RDF dump• Source data type: RDF dump
• Source entity: dcterms:title
• Filtering: English titles Target
• Target data type. SPARQL Endpoint
• Target entity: dcterms:title
• Filtering: English titles
Setting
• Matching algorithm: Trigrams• Matching algorithm: Trigrams
• Threshold of acceptance: 95%
• Output file format: N-TRIPLE
• ...
6. Studies and experimentations30 out of 54
32. The interlinking tools (SILK)The interlinking tools (SILK)
6. Studies and experimentations32 out of 54
33. The interlinking tools (LIMES)The interlinking tools (LIMES)
Source &
Target datasets
Condition
6. Studies and experimentations33 out of 54
34. The interlinking tools (LOD Refine)The interlinking tools (LOD Refine)
6. Studies and experimentations34 out of 54
35. Sample interlinking results (exact matched)Sample interlinking results (exact matched)
Title in both
datasets Globe resource Target URI Dataset name
l
http://www.globe-
i f / t/l 2
http://schools.nyc.gov/NR/rdonlyres/6C64098
F-0C24-4B27-A22F-
F542A2F97DA0/130926/TTS_G11_LiteracySSa
ndScience_NuclearEnergy.pdf
ASN
Nuclear Energy info.org/ont/lom2o
wl# 108450 http://resrev.ilrt.bris.ac.uk/research-revealed-
hub/publications/118933#pub
Bristol
http://data.linkedu.eu/hud/book/118555 Huddersfield
Bibliography
http://www.globe-
info.org/ont/lom2o
wl#178214
http://resrev.ilrt.bris.ac.uk/research-revealed-
hub/publications/15140#pub
OpenUK
http://data.uni-
muenster de/context/istg/allegro/6/210/T0024 Muenstermuenster.de/context/istg/allegro/6/210/T0024
4773
Muenster
6. Studies and experimentations35 out of 54
36. Evaluating the interlinking toolsEvaluating the interlinking tools
We used three tools to interlink GLOBE to DBpedia
GLOBE and DBpedia on title
6. Studies and experimentations36 out of 54
37. Evaluating the interlinking toolsEvaluating the interlinking tools
Does the result change if we use more than one tool?
Common results among the tools
6. Studies and experimentations37 out of 54
38. Remarks of this investigation
Applying the interlinking tools for linking datasets
i li bl th d
Remarks of this investigation
is a reliable method.
Silk and LIMES were the efficient tools for
similarity discovery among the LOD datasets.similarity discovery among the LOD datasets.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An
empirical study on the evaluation of interlinking tools on the
Web of Data,” Journal of Information Science, vol 40, pp.637–
648 2014 fi t bli h d J 11 2014648 2014, first published on June 11, 2014.
6. Studies and experimentations38 out of 54
40. Evaluating the interlinking resultsEvaluating the interlinking results
Interlinking tools perform an interlinking
d i t t th t h dprocess and print out the matched resources.
The question under this discussion is to what The question under this discussion is to what
extent are the results reliable?
An important step after doing the interlinking is
evaluating the interlinking results by human
and domain expertsand domain experts.
6. Studies and experimentations40 out of 54
42. GLOBE metadata analysisGLOBE metadata analysis
Creating a criteria under which we can find
appropriate elements for interlinking (datatype, completeness, content)pp p g ( yp , p , )
6. Studies and experimentations42 out of 54
44. Interlinking resultsInterlinking results
Title Keyword Taxon Coverage
GLOBE 8,260 228,352 134,791 12,941
Percentage 2% 74% 76% 78%
Interlinking through the Keyword element
6. Studies and experimentations44 out of 54
45. Evaluating the interlinking resultsEvaluating the interlinking results
We evaluated the interlinking results from
the following perspectives:the following perspectives:
Reliability
Level of agreement between the ratersg
Relationship among results (e.g., threshold
75%)
Is parent of, Is related to, Is part of
Enrichment of content
Li ki d Linking one resource to many datasets on
the Web
6. Studies and experimentations45 out of 54
46. Remarks of this investigationg
Human experts (the results raters) agreed that the
interlinking results are reliable.
Interlinking a learning repository to several
educational datasets in the LOD cloud leads to the
enrichment of content.
Interlinking results can lead to duplicate metadata
finding.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering
Duplicate and Related Resources using Interlinking Approach: The
case of Educational Datasets,” Journal of Information Science, first
bli h d M h 10 2015
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking
Educational Data: an Experiment with Engineering-related
published on March 10, 2015
Resources in GLOBE,” International Journal of Engineering
and Education, Vol 31-3, 2015.
6. Studies and experimentations46 out of 54
47. Conclusions
1. Exposing eLearning metadata as Linked
O D tOpen Data
A complete analysis was done on exposing the
IEEE LOM schema as RDFIEEE LOM schema as RDF.
A new ontology was designed for RDF binding
of IEEE LOM.
Keyword, Coverage, Classification, and Title
were appropriate elements for interlinking.
7. Conclusion & Future work47 out of 54
48. Conclusions (cont.)
2. Evaluating Linked Data tools & datasets
( )
Silk and LIMES were the efficient frameworks
in terms of discovering similarities between two
ddatasets.
DBpedia was identified as the hub of the LOD
cloud.cloud.
Twenty educational dataset were identified as
the most proper targets for interlinking.
The Open University of UK includes rich
metadata schema and reliable endpoint.
7. Conclusion & Future work48 out of 54
49. Conclusions (cont.)
3. Enriching the educational datasets
( )
g
Interlinking results were reviewed and
verified by human experts.
Several educational resources were linked to
more than one resources in the LOD cloud.
A duplicate identification was proposed after
the analysis of the interlinking results.
7. Conclusion & Future work49 out of 54
50. Additional contributions
Implementing several platform for exposing data as
Linked DataLinked Data
Organic.Edunet (http://data.organic-edunet.eu)
ARIADNE (http://ariadne.grnet.gr)
Open Discovery Space (http://data opendiscoveryspace eu) Open Discovery Space (http://data.opendiscoveryspace.eu)
Agrega (http://agrega2.red.es/ )
Submitting the IEEE LOM ontology to Linked Open
Vocabularies (LOV) atVocabularies (LOV) at
http://lov.okfn.org/dataset/lov/vocabs/lom
Developing an online Mashup to interlinking
eLearning objects to Web of Data (research stay ineLearning objects to Web of Data (research stay in
Agroknow)
Writing a book chapter about “Optimizing Big Data
using the Linked Data approach”
7. Conclusion & Future work50 out of 54
52. Future work
Content
Applying the interlinking approach to other
educational repositories
Tools and software Tools and software
Extending the tools to link one datasets to
several datasets at the same time
Adding some semantic similarity services to
tools to improve the interlinking results
Linking educational resources by datasets
crawling
7. Conclusion & Future work52 out of 54
53. Publications (journal papers)(j p p )
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Resources to Web of Data
through IEEE LOM”. Computer Science and Information Systems, vol. 12, No. 1, pp. 233–255, 2015.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using
Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first
published on March 10, 2015 doi:10.1177/0165551515575922.
E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing Broken Links on the Web of Data: an
Experiment with DBpedia,” Journal of the Association for Information Science and
Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014 doi: 10.1002/asi.23109.gy ( ) pp /
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An empirical study on the evaluation of interlinking
tools on the Web of Data,” Journal of Information Science, vol 40, pp.637–648 2014, first published
on June 11, 2014 doi:10.1177/0165551514538151.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with
Engineering related Resources in GLOBE ” International Journal of Engineering and EducationEngineering-related Resources in GLOBE, International Journal of Engineering and Education,
2015. In press.
E. Rajabi, W Greller, K Niemann, K Kastrantas, S Sanchez-Alonso, Social data interoperability in
educational repositories and federations , International Journal of Metadata, Semantics and
Ontologies 8 (2), 169 - 178, 2013.
E. Rajabi, S. Sanchez-Alonso, M.-A. Sicilia, and N. Mouneselis, “A linked and open dataset from a
network of learning repositories on organic agriculture”, British Journal of Educational Technology,
submitted (under second review).
M-C Valiente, M.-A. Sicilia, E. Garcia-Barriocanal, E. Rajabi, "Adopting the metadata approach to
improve the search and analysis of educational resources for online learning", Computers in Human
7. Conclusion & Future work53 out of 54
p y g , p
Behavior. 2015. In press.
54. Publications (conference papers)( p p )
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an
Experiment with GLOBE Resources,” presented at the First International Conference
on Technological Ecosystem for Enhancing Multiculturality, Salamanca, Spain, 2013.g y g y, , p ,
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, "Research Objects Interlinking: The
Case of Dryad Repository”, presented at Metadata and Semantics Research, Karlsruhe,
Germany, 2014.
E. Rajabi, and S. Sanchez-Alonso, "Enriching the e-learning contents usingj , , g g g
interlinking”, presented at 5th eLearning conference, Belgrade, Serbia, 2014.
Link: https://scholar.google.es/scholar?oi=bibs&cluster=16249634834288673991&bt
nI=1&hl=en
E. Rajabi, M.-A. Sicilia, S. Sanchez-Alonso, A Simple Approach towards SKOSification
of Digital Repositories , Metadata and Semantics Research, 67-74, 2013.
M-A. Sicilia, S. Sanchez-Alonso, E. Garcia-Barriocanal, J. Minguillón and E. Rajabi,
Exploring the keyword space in large learning resource aggregations: the case of
GLOBE, Lacro workshop, April 2013.
7. Conclusion & Future work54 out of 54