Shape Expressions: An RDF validation and transformation language
1. Shape Expressions: An RDF validation
and transformation language
Eric Prud'hommeaux
World Wide Web
Consortium
MIT, Cambridge, MA, USA
eric@w3.org
Harold Solbrig
Mayo Clinic
USA
College of Medicine, Rochester,
MN, USA
Jose Emilio Labra Gayo
WESO Research group
University of Oviedo
Spain
labra@uniovi.es
2. This talk in 1 slide
Motivating example:
Represent issues and users in RDF
...and validate that data
Shape Expressions = simple language to:
Describe the topology of RDF data
Validate if an RDF graph matches a given shape
Shape expressions can be extended with actions
Possible application: transform RDF into XML
3. Motivating example
Represent in RDF a issue tracking system
Issues are reported by users on some date
Issues have some status (assigned/unassigned)
Issues can also be reproduced on some date by users
User Issue
4. User__
foaf:name: xsd:string
foaf:givenName: xsd:string*
foaf:familyName: xsd:string
foaf:mbox: IRI
Issue__
:status: (:Assigned :Unassigned)
:reportedOn: xsd:date
:reproducedOn: xsd:date
1 :reportedBy 0..*
0..* :reproducedBy 0..1
0..*
0..1
:related
E-R Diagram
...and several constraints
A user:
- has full name or
several given names and one
family name
- can have one mbox
A Issue
- has status Assigned/Unassigned
- is reported by a user
- is reported on a date
- can be reproduced by a user on a
date
- is related to other issues
6. Problem statement
We want to detect possible errors in RDF like:
Issues without status
Issues with status different of Assigned/Unassigned
Issues reported by something different to a user
Issues reported on a date with a non-date type
Issues reproduced on a date before the reported date
Users without mbox
Users with 2 names
Users with with a name of type integer
...lots of other errors...
Q: How can we describe RDF data to be able to detect those errors?
A: Our proposal = Shape Expressions
7. Shape Expressions - Users
A user can have either:
one foaf:name
or one or more foaf:givenName and one foaf:familyName
all of them must be of type xsd:string
A user can have one foaf:mbox with value any IRI
<UserShape> {
( foaf:name xsd:string
| foaf:givenName xsd:string+
, foaf:familyName xsd:string
)
, foaf:mbox IRI ?
}
The example uses compact syntax
Shape Expressions can also be represented in RDF
8. Shape Expressions - Issues
Issues :status must be either :Assigned or :Unassigned
Issues are :reportedBy a user
Issues are :reportedOn a xsd:date
A issue may be :reproducedBy a user and :reproduceOn an xsd:date
A issue can be :related to several issues
<IssueShape> {
:status (:Assigned :Unassigned)
, :reportedBy @<UserShape>
, :reportedOn xsd:date
, ( :reproducedBy @<UserShape>
, :reproducedOn xsd:date
)?
, :related @<IssueShape>*
}
10. FAQ: Why not use SPARQL?
<UserShape> {
( foaf:name xsd:string
| foaf:givenName xsd:string+
, foaf:familyName xsd:string
)
, foaf:mbox IRI ?
}
<IssueShape> {
:status (:Assigned :Unassigned)
, :reportedBy @<UserShape>
, :reportedOn xsd:date
, ( :reproducedBy @<UserShape>
, :reproducedOn xsd:date
)?
, :related @<IssueShape>*
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CONSTRUCT {
?IssueShape :hasShape <IssueShape> .
?UserShape :hasShape <UserShape> .
} { { SELECT ?IssueShape {
?IssueShape :status ?o } GROUP BY ?IssueShape HAVING (COUNT(*)=1)}
{ SELECT ?IssueShape {
?IssueShape :status ?o .
FILTER ((?o = :Assigned || ?o = :Unassigned))
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)}
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c0) {
?IssueShape :reportedBy ?o .
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)}
{ SELECT ?IssueShape {
?IssueShape :reportedBy ?o .
FILTER ((isIRI(?o) || isBlank(?o)))
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)}
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c1) {
{ SELECT ?IssueShape ?UserShape {
?IssueShape :reportedBy ?UserShape .
FILTER (isIRI(?UserShape) || isBlank(?UserShape))
} }
{ SELECT ?UserShape WHERE {
{ { SELECT ?UserShape {
?UserShape foaf:name ?o .
} GROUP BY ?UserShape HAVING (COUNT(*)=1)}
{ SELECT ?UserShape {
?UserShape foaf:name ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))
} GROUP BY ?UserShape HAVING (COUNT(*)=1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
} UNION {
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) {
?UserShape foaf:givenName ?o .
} GROUP BY ?UserShape HAVING (COUNT(*)>=1)}
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c1)
{ ?UserShape foaf:givenName ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))
} GROUP BY ?UserShape
HAVING (COUNT(*)>=1)}
FILTER (?UserShape_c0 = ?UserShape_c1)
{ SELECT ?UserShape {
?UserShape foaf:familyName ?o .
} GROUP BY ?UserShape
HAVING (COUNT(*)=1)}
{ SELECT ?UserShape {
?UserShape foaf:familyName ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))
} GROUP BY ?UserShape HAVING (COUNT(*)=1)}
}
} GROUP BY ?UserShape HAVING (COUNT(*) = 1)}
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c2)
{
?UserShape foaf:mbox ?o .
} GROUP BY ?UserShape HAVING (COUNT(*)<=1)}
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c3)
{
?UserShape foaf:mbox ?o .
FILTER (isIRI(?o))
} GROUP BY ?HAVING (COUNT(*)<=1)}
FILTER (?UserShape_c2 = ?UserShape_c3)
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
UserShape_c2 = ?UserShape_c3)
} GROUP BY ?IssueShape }
FILTER (?IssueShape_c0 = ?IssueShape_c1)
OPTIONAL {
?IssueShape :reportedBy ?IssueShape_UserShape_ref0 .
FILTER (isIRI(?IssueShape_UserShape_ref0)
|| isBlank(?IssueShape_UserShape_ref0)) }
{ SELECT ?IssueShape {
?IssueShape :reportedOn } GROUP BY ?IssueShape HAVING (COUNT(*)=1)}
{ SELECT ?IssueShape {
?IssueShape :reportedOn ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:date))
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)} {
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c2) {
?IssueShape :reproducedBy ?o .
} GROUP BY IssueShape}
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c3) {
?IssueShape :reproducedBy ?o .
FILTER ((isIRI(?o) || isBlank(?o)))
} GROUP BY ?IssueShape}
FILTER (?IssueShape_c2 = ?IssueShape_c3)
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c5) {
?IssueShape :reproducedOn ?o .
} GROUP BY ?IssueShape}
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c6) {
?IssueShape :reproducedOn ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:date))
} GROUP BY IssueShape}
FILTER (?IssueShape_c5 = ?IssueShape_c6)
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
FILTER (?IssueShape_c2=0 && ?IssueShape_c5=0 ||
?IssueShape_c2>=1&&?IssueShape_c2<=1 &&
?IssueShape_c5>=1&&?IssueShape_c5<=1)
}
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c7) {
?IssueShape :related ?o .
} GROUP BY ?IssueShape}
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c8) {
?IssueShape :related ?o .
} GROUP BY ?IssueShape}
FILTER (?IssueShape_c7 = ?IssueShape_c8)
{ SELECT ?UserShape WHERE {
{ { SELECT ?UserShape {
?UserShape foaf:name ?o .
} GROUP BY ?UserShape HAVING (COUNT(*)=1)}
{ SELECT ?UserShape {
?UserShape foaf:name ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))
} GROUP BY ?UserShape HAVING (COUNT(*)=1)}
} UNION {
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) {
?UserShape foaf:givenName ?o .
} GROUP BY ?UserShape HAVING (COUNT(*)>=1)}
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) {
?UserShape foaf:givenName ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))
} GROUP BY ?UserShape HAVING (COUNT(*)>=1)}
FILTER (?UserShape_c0 = ?UserShape_c1)
{ SELECT ?UserShape {
?UserShape foaf:familyName ?o .
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
} GROUP BY ?UserShape HAVING (COUNT(*)=1)}
{ SELECT ?UserShape {
?UserShape foaf:familyName ?o .
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))
} GROUP BY ?UserShape HAVING (COUNT(*)=1)}
}
} GROUP BY ?UserShape HAVING (COUNT(*) = 1)}
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) {
?UserShape foaf:mbox ?o .
} GROUP BY ?UserShape HAVING (COUNT(*)<=1)}
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) {
?UserShape foaf:mbox ?o . FILTER (isIRI(?o))
} GROUP BY ?UserShape HAVING (COUNT(*)<=1)}
FILTER (?UserShape_c2 = ?UserShape_c3)
}
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
.
.
.
.
Shape Expression
Shape Expressions can be converted to SPARQL
But Shape Expressions are simpler and more readable to solve this problem
11. Shape Expressions Language
Schema = set of Shape Expressions
Shape Expression = labeled pattern
Typical pattern = conjunction of several expressions
Conjunction represented by ,
<IssueShape> {
:status (:Assigned :Unassigned)
, :reportedBy @<UserShape>
, :reportedOn xsd:date
...
}
<label> {
...pattern...
}
Label
Conjunction
12. Arcs
Basic expression: an Arc
Arc = name definition followed by value definition
<IssueShape> {
:status (:Assigned :Unassigned)
, :reportedBy @<UserShape>
, :reportedOn xsd:date
...
}
Name defn Value defn
:status :Unassigned
:isue1 :reportedBy :bob
:reportedOn 23-01-2013
13. Value definition
Value definitions can be
Value type xsd:date Matches a value of type xsd:date
Value set ( :Assigned
:Unassigned )
The object is an element of the given set
Reference @<UserShape> The object has shape <UserShape>
Stem foaf:~ Starts with the IRI associated with foaf
Any - :Checked Any value except :Checked
<IssueShape> {
:status (:Assigned :Unassigned)
, :reportedBy @<UserShape>
, :reportedOn xsd:date
...
}
Value set
Value reference
Value type
14. Name definition
Name definitions can be
Name term foaf:name Matches given IRI
Name stem foaf:~ Any predicate that starts by foaf
Name any - foaf:name Any predicate except foaf:name
<IssueShape> {
:status (:Assigned :Unassigned)
, :reportedBy @<UserShape>
, :reportedOn xsd:date
...
}
Name terms
15. Alternatives
Alternatives (disjunctions) are marked by |
Example 1: An agent has either foaf:name or rdfs:label
<Agent> {
( foaf:name xsd:string | rdfs:label xsd:string )
...
}
Example 2: A list of integers
<listOfInt> {
rdf:first xsd:integer
, ( rdf:rest ( rdf:nil )
| rdf:rest @<listOfInt>
)
}
16. Cardinalities
The same as in common regular expressions
* 0 or more
+ 1 or more
? 0 or 1
{m} m repetitions
{m,n} Between m and n repetitions
<IssueShape> {
...
( :reproducedBy @<UserShape>, :reproducedOn xsd:date)?
, :related @<IssueShape>*
}
17. Semantic actions
Define actions to be executed during validation
<Issue> {
...
:reportedOn xsd:date %js{ report = _.o; return true; %}
, ( :reproducedBy @<UserShape>
, :reproducedOn xsd:date %js{ return _.o.lex > report.lex; %}
) ?
}
%lang{ ...actions... %}
Calls lang processor passing it the given actions
Example:
Check that :reportedOn must be before :reproducedOn
18. Semantics of Shape Expressions
Operational semantics using inference rules
Inspired by the semantics of RelaxNG
Formalism used to define type inference systems
Matching infer shape typings
Axioms and rules of the form:
19. Example: matching rules ( )
Graph can be decomposed
in g1 and g2
Combine typings
t1 and t2
Context Graph Type Assignment
20. Transforming RDF using ShEx
Semantic actions can be combined with
specialized languages
Possible languages: sparql, js
Other examples:
GenX = very simple language to generate XML
Goal: Semantic lowering
Map RDF clinical records to XML
GenJ generates JSON
22. GenX
GenX syntax
$IRI Generates elements in that namespace
<name> Add element <name>
@<name> Add attribute <name>
=<expr> XPath function applied to the value
= Don't emit the value
[-n] Place the value up n values in the hierarchy
25. Current Implementations
Name Main
Developer
Language Features
FancyDemo Eric
Prud'hommeaux
Javascript First implementation
Semantic Actions
- GenX, GenJ
Conversion to SPARQL
http://www.w3.org/2013/ShEx/
JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntax
https://github.com/jessevdam/shextest
ShExcala Jose E. Labra Scala Several extensions:
negations, reverse arcs, relations,...
Efficient implementation using Derivatives
http://labra.github.io/ShExcala/
Haws Jose E. Labra Haskell Prototype to check inference semantics
http://labra.github.io/haws/
26. Applications to linked data portals
2 data portals: WebIndex and LandPortal
Data portal documentation
http://weso.github.io/wiDoc/ http://weso.github.io/landportalDoc/data
<Observation> {
cex:md5-checksum xsd:string
, cex:computation @<Computation>
, dcterms:issued xsd:integer
, dcterms:publisher ( wi-org:WebFoundation )
, qb:dataSet @<Dataset>
, rdfs:label (@en)
, sdmx-concept:obsStatus @<ObsStatus>
, wi-onto:ref-area @<Area>
, wi-onto:ref-indicator @<Indicator>
, wi-onto:ref-year xsd:int
, cex:value xsd:double
, a ( qb:Observation )
}
<Observation> {
cex:ref-area @<Area>
, cex:ref-indicator @<Indicator>
, cex:ref-time @<Time>
, cex:value xsd:double?
, cex:computation @<Computation>
, dcterms:issued xsd:dateTime
, qb:dataSet @<DataSet>
, qb:slice @<Slice>
, rdfs:label xsd:string
, lb:source @<Upload>
, a ( qb:Observation )
}
Same type: qb:Observation
...but different shapes More info:
Paper on Linked Data Quality Workshop
27. Conclusions
Shape Expressions = simple language
One goal: Describe and validate RDF graphs
Semantics of Shape Expressions
Described using inference rules
...but Shape Expressions can be converted to SPARQL
Compatible with other Semantic technologies
Semantic actions = Extensibility mechanism
Can be applied to transform RDF
28. Future Work
Improve implementations and language
Debugging and error messages
Expressiveness and usability of language
Performance evaluation
Shape Expressions = role similar to Schema for XML
Future applications:
Online validators
Interface generators
Binding: generate parsers/tools from shapes
Performance of RDF triplestores?
29. Future work at w3c
RDF Data shapes WG chartered
Mailing list: public-rdf-shapes@mail.org
"The discussion on public-rdf-shapes@w3.org is the best entertainment since years;
Game of Thrones colors pale." Paul Hermans (@PaulZH)
30. End of presentation
Slides available at:
http://www.slideshare.net/jelabra/semantics-2014