2. What is RDB2RDF?
2
ID NAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/1>
<City/100>
Alice 25
Austin
<Person/2>
Alice
<City/200> Madrid
foaf:namefoaf:name foaf:age
foaf:name
foaf:name
foaf:based_near
3. Context
RDF
Data Management
Relational Database to RDF
(RDB2RDF)
Triplestores
Wrapper
Systems
Extract-Transform-Load
(ETL)
RDBMS-backed
Triplestores
Native
Triplestores
NoSQL
Triplestores
3
4. Outline
• Scenarios
• W3C RDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
4
9. Outline
• Scenarios
• W3C RDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
9
10. W3C RDB2RDF Standards
• Standards to map relational data to RDF
• A Direct Mapping of Relational Data to RDF
– Default automatic mapping of relational data to
RDF
• R2RML: RDB to RDF Mapping Language
– Customizable language to map relational data to
RDF
10RDB2RDF
13. ID (pk) NAME AGE
1 Alice 25
2 Bob NULL
Person
TableTriple
13
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person>
rdf:type
Base IRI “Table Name”/“PK attr”=“PK value”
Note: If there is no PK, then
a fresh blank node for every
row is generated.
15. ID
(pk)
NAME AGE
CID
(fk)
1 Alice 25 100
2 Bob NULL 200
Person
CID
(pk)
TITLE
100 Austin
200 Madrid
City
ReferenceTriples
15
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person#ref-CID>
<http://www.ex.com/City/CID=100>.
16. Direct Mapping Result
16
ID NAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/ID=1>
<City/CID=100>
Alice
25
Austin
<Person/ID=2>
Alice
<City/CID=200> Madrid
<Person#NAME>
<Person#AGE> <Person#NAME>
<Person#NAME>
<Person#NAME>
<Person#ref-CID>
<Person#ref-CID>
17. Summary: Direct Mapping
• Default and Automatic Mapping
• URIs are automatically generated
– <table>
– <table#attribute>
– <table#ref-attribute>
– <Table#pkAttr=pkValue>
• RDF represents the same relational schema
• RDF can be transformed by
SPARQL CONSTRUCT
– RDF represents the structure and ontology of mapping
author’s choice
17
18. What else is missing?
• Relational Schema to OWL is *not* in the
W3C standard
• Many-to-Many relationships (binary tables)
• “Ugly” IRIs
18
20. Create R2RML
• Input
– Knowledge of the database (schema and data)
– Knowledge of the domain ontologies
– Knowledge of mappings
• Output
– R2RML file
• Direct Mapping helps to “bootstrap”
20
28. SELECT ID, NAME FROM Person WHERE GENDER = "F"
Ex:Person1 rdf:type ex:Woman .
Ex:Person1 foaf:name “Alice” .
R2RMLViews
28
29. @prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:sqlQuery
“””SELECT ID, NAME
FROM Person WHERE gender = “F” “””];
rr:subjectMap [
rr:template "http://www.ex.com/Person/{ID}";
rr:class <http://www.ex.com/Woman>
];
rr:predicateObjectMap [
rr:predicate foaf:name;
rr:objectMap [rr:column ”NAME" ]
]
.
R2RMLView
29
30. Summary: R2RML
• Manual and Customizable Language
• Learning Curve
• Direct Mapping bootstraps R2RML
• RDF represents the structure and ontology of
mapping author’s choice
30
31. What else is missing?
• 100 tables x 10 attributes each
• >1000 R2RML mappings
• Lack of R2RML editing tools
31
32. Outline
• Scenarios
• W3C RDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
32
43. W3C RDB2RDF
• Task: Integrate data from
relational DBMS with
Linked Data
• Approach: map from
relational schema to
semantic vocabulary with
R2RML
• Publishing: two
alternatives –
– Translate SPARQL into SQL
on the fly
– Batch transform data into
RDF, index and provide
SPARQL access in a
triplestore
43
LDDatasetAccess
Integrated
Data in
Triplestore
Interlinking Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
Dataacquisition
R2RML
Engine
Relational
DBMS
RDB2RDF
44. MusicBrainz Next Gen Schema
44
• artist
As pre-NGS, but
further attributes
• artist_credit
Allows joint credit
• release_group
Cf. ‘album’
versus:
• release
• medium
• track
• tracklist
• work
• recording
https://wiki.musicbrainz.org/Next_Generation_Schema
RDB2RDF
45. Music Ontology
45
• MusicArtist
– ArtistEvent, member_of
• SignalGroup
‘Album’ as per Release_Group
• Release
– ReleaseEvent
• Record
• Track
• Work
• Composition
http://musicontology.com/
RDB2RDF
46. Scale
46
• MusicBrainz RDF derived via R2RML:
lb:artist_member a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""]
;
rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate mo:member_of ;
rr:objectMap [rr:template
"http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .
300M
Triples
48. R2RML Property Mapping
• Mapping columns to properties can be easy:
lb:artist_name a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT artist.gid, artist_name.name
FROM artist
INNER JOIN artist_name ON artist.name =
artist_name.id"""] ;
rr:subjectMap [rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
rr:predicateObjectMap
[rr:predicate foaf:name ;
rr:objectMap [rr:column "name"]] .
RDB2RDF 48
49. NGS Advanced Relations
49
• Major entities (Artist, Release Group, Track, etc.) plus
URL are paired
(l_artist_artist)
• Each pairing
of instances
refers to a Link
• Links have types
(cf. RDF properties)
and attributes
http://wiki.musicbrainz.org/Advanced_Relationship
RDB2RDF
50. Advanced Relations Mapping
• Mapping advanced relationships (SQL joins):
lb:artist_member a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-
12ffb71e3821'"""] ;
rr:subjectMap [rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
rr:predicateObjectMap
[rr:predicate mo:member_of ;
rr:objectMap [rr:template
"http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .
50RDB2RDF
51. Advanced Relations Mapping
• Mapping advanced relationships (SQL joins):
lb:artist_dbpedia a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT artist.gid,
REPLACE(REPLACE(url, 'wikipedia.org/wiki',
'dbpedia.org/resource'),
'http://en.',
'http://')
AS url
FROM artist
INNER JOIN l_artist_url ON artist.id = l_artist_url.entity0
INNER JOIN link ON l_artist_url.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN url on l_artist_url.entity1 = url.id
WHERE link_type.gid='29651736-fa6d-48e4-aadc-a557c6add1cb'
AND url SIMILAR TO
'http://(de|el|en|es|ko|pl|pt).wikipedia.org/wiki/%'"""] ;
rr:subjectMap lb:sm_artist ;
rr:predicateObjectMap
[rr:predicate owl:sameAs ;
rr:objectMap [rr:column "url"; rr:termType rr:IRI]] .
51RDB2RDF
52. SPARQL Example
• SPARQL versus SQL
ASK {dbp:Paul_McCartney mo:member dbp:The_Beatles}
SELECT …
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
WHERE AND … AND … AND … AND …
52RDB2RDF
53. UpcomingTutorials
• ESWC – Montpellier, France
– May 27, 2013
• SemTechBiz – San Francisco, USA
– June 2, 2013
• More info: www.rdb2rdf.org
RDB2RDF 53
54. For exercises, quiz and further material visit our website:
54
@euclid_project EUCLID project EUCLIDproject
http://www.euclid-project.eu
Other channels:
eBook Course