Optimized index structures for querying rdf from the web

Optimized Index Structures
for Querying RDF from the
Web
Presented by : Mahdi Atawna
1

About the Paper
 The paper was published at the Third Latin
American Web Congress in 2005
 Have 56 citation.
Andreas Harth
National University
of Galway, Ireland
Prof. Stefan Decker
National University of
Galway, Ireland
2

Outline
 Overview of Semantic Web.
 Overview of Indexes.
 Paper motivation.
 Methodology.
 Experiment & Result.
 Conclusion.
3

Semantic Web
 Also called :
 Web 3.0.
 the Linked Data Web.
 the Web of Data…whatever you call it.
 the next major evolution in connecting information.
4

Why semantic web?
 It enables data to be linked from a source to any
other source.
 It can be understood by computers so that they can
perform increasingly sophisticated tasks on our
behalf.
5

6
Source: http://lod-cloud.net

Semantic Web Standards
 RDF (Resource Description Framework): The data modeling
language for the Semantic Web (like UML). All Semantic Web
information is stored and represented in the RDF.
 SPARQL : The query language of the Semantic Web.
 OWL (Web Ontology Language) The schema language, or
knowledge representation (KR) language, of the Semantic Web.
7

What is RDF?
 RDF is the data model of the Semantic Web.
 That means that all data in Semantic Web
technologies is represented as RDF.
 If you store Semantic Web data, it's in RDF.
 If you query Semantic Web data (typically using
SPARQL), it's RDF data. If you send Semantic Web
data to your friend, it's RDF.
8

10
Source : http://www.w3.org/TR/rdf11-primer/

RDF triples
 are representations of graph edges.
11
Subject Object
Predicate
Mahdi Hebron
born in

RDF example
@prefix foaf: <http://xmlns.com/foaf/0.1/ .
<http://example.org/bob#me>
foaf:topic_interest
<http://wikidata.org/entity/Q12418> .
12

RDF example
@prefix foaf: <http://xmlns.com/foaf/0.1/ .
<http://example.org/bob#me>
foaf:topic_interest
<http://wikidata.org/entity/Q12418> .
Subject
predicate
Object
13

Sparql query language
SELECT ?p ?o
{
<http://nasa.dataincubator.org/spacecraft/1968-089A> ?p ?o
}
14

What is database index?
 A database index is a data structure that improves the
speed of data retrieval operations on a database table at
the cost of additional writes and storage space to maintain
the index data structure.
 Index goal : The index structure enables fast retrieval of
data
15

Index example
key articles
Leonardo [104,70,12,98]
Mona Lisa [2,201,7,20,12]
Francesco [1,8,900,104]
17

Paper motivation
 Previous Systems provide a storage infrastructure for RDF data, but
index structure which do not support typical query scenarios
for data from the Web which results in poor query answering
performance in some cases.
18

Methodology
 The researchers present a new index structure that handle the
data from the Web .
 Implemented the index structure in a lightweight software called
YARS
19

RDF Index structures
 The authors suggested an index structure that contains two
sets:
1. Lexicon : covers the string representation of RDF graph
(r,l,b)
2. Quad indexes : cover the quads (triples).
20

1. Lexicon indexes
 NodeOID and OIDNode Index
 Keyword Index
21

1. Lexicon indexes
 NodeOID and OIDNode Index :
22
Key value
<http://www.harth.org/andreas/#me> 3
<http://decker.cn/stefan/> 14
<http://sw.deri.org/ aharth/foaf.rdf> 11
<http://www.deri.org/> 1

1. Lexicon indexes
 Keyword Index (Popular in search engines)
23
Key No of hits List of hits
“Andreas” 1 3
“Decker” 1 11
”Harth” 1 3
“Stefan” 2 11,13

2. Quad indexes
 Access Patterns
 Combined Indexes
 Occurrence Counts
24

2. Quad indexes
A- Access Patterns
25
No Access pattern No Access pattern
1 (?:?:?:?) 9 (s:?:o:c)
2 (s:?:?:?) 10 (?:?:o:c)
3 (s:p:?:?) 11 (?:?:o:?)
4 (s:p:o:?) 12 (?:?:?:c)
5 (s:p:o:c) 13 (s:?:?:c)
6 (?:p:?:?) 14 (s:p:?:c)
7 (?:p:o:?) 15 (?:p:?:c)
8 (?:p:o:c) 16 (s:?:o:?)

2. Quad indexes
A- Access Patterns
26
No Access pattern values
1 (?:?:?:?) [1,5,3]
2 (s:?:?:?) [2]
3 (s:p:?:?) [9,8,2,3]
4 (s:p:o:?) [1,3]
5 (s:p:o:c) [1]
6 (?:p:?:?) [76,9]
7 (?:p:o:?) [2,3]

2. Quad indexes
B- Combined Indexes
27
spoc poc osc csp cp os
(?:?:?:?) (?:p:?:?) (?:?:o:?)
(?:?:?:c) (?:p:?:c) (s:?:o:?)
(s:?:?:?) (?:p:o:?) (?:?:o:c)
(s:?:?:c)
(s:p:?:?) (?:p:o:c) (s:?:o:c)
(s:p:?:c)
(s:p:o:?)
(s:p:o:c)

2. Quad indexes
C- Occurrence Counts
28
No
Access
pattern
values count
1 (?:?:?:?) [1,5,3] 3
2 (s:?:?:?) [2] 1
3 (s:p:?:?) [9,8,2,3] 4
4 (s:p:o:?) [1,3] 2
5 (s:p:o:c) [1] 1
6 (?:p:?:?) [76,9] 2
7 (?:p:o:?) [2,3] 2

YARS
 Web application that built in JAVA.
 Has two parts:
 a storage component that handles both persistent and in-
memory indexes.
 a query handler to perform query processing and evaluation.
29

Experiment
 They evaluated the performance based on a dataset of 2.8
million triples (293 MB).
 The testing server has :
 Pentium-4 2.4 GHz
 4 GB RAM
 running Debian Sarge .
30

Experiment
They considered the following RDF stores for evaluation:
 Sesame.
 Kowari (failed to get a running version).
 Redland.
 Jena2. ([9] shows that Sesame generally supersedes Jena in
performance results)
31

Experiment
Experiment parts:
 Index Construction.
 Queries.
32

Result – index construction
System Index size (bytes)
Redland 2.164.019.200
Sesame MySQL 340.381.636
Sesame native 39.997.992
YARS 1.090.002.944
33
Table 8: Index size for the synthetic Univ20 dataset.

Result – index construction
34

Result - queries
No Query
1 ?x rdf:type univ:UndergraduateStudent
2 ?x ?p ”UndergraduateStudent0”
3 <http://www.University965.edu> ?p ?o
4 ?x univ:worksFor ?y
35
Query Redland Sesame MySQL Sesame Native YARS
1 0:10.48 0:18.87 1:05.16 0:18.41
2 0:44.14 0:00.73 0:00.55 0:00.49
3 0:44.15 0:00.46 0:00.47 0:00.32
4 3:04.21 0:03.42 0:01.95 0:00.47
Performance results for quad queries.

Conclusion
 The auther introduced query processing for RDF which is an I
portant issue in sematic web.
 YARS has some overhead for resolving the dependencies and
order in comparison with others.
36

Criticism
 - In experiment , the researchers removed “Kowari “ engine
because the cannot install it.
37

Optimized index structures for querying rdf from the web

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Optimized index structures for querying rdf from the web

Similaire à Optimized index structures for querying rdf from the web (20)

Plus de Mahdi Atawneh

Plus de Mahdi Atawneh (6)

Dernier

Dernier (20)

Optimized index structures for querying rdf from the web