SlideShare une entreprise Scribd logo
1  sur  24
Representing Verifiable Statistical 
Computations as linked data 
Hania Farham 
The Web Foundation 
London 
hania@webfoundation.org 
Jose Emilio Labra Gayo 
WESO Research group 
University of Oviedo 
Spain 
labra@uniovi.es 
Juan Castro Fernández 
WESO Research Group 
University of Oviedo 
Spain 
juan@weso.es 
Jose María Álvarez Rodríguez 
Dept. Computer Science 
Carlos III University 
Spain 
josemaria.alvarez@uc3m.es
This talk in one slide 
Describe the WebIndex Project 
Represents an statistical index 
Data Model based 
Computation and validation process 
Visualization
Web Index 
Measure WWW's contribution to development and 
human rights by country 
Developed by the Web Foundation 
Web page: 
http://thewebindex.org 
Linked data portal: 
http://data.webfoundation.org/webindex/2013
Technical details 
Index made from 
81 countries, 5 years (2007-12 
116 indicators: 
84 Primary (questionnaires) 
32 Secondary (external sources) 
Linked data portal 
Modeled on top of RDF Data Cube 
Linked data: DBPedia, Organizations, etc.
Different versions 
2012. Visualizations & linked data portal 
RDF representation based on RDF Data Cube 
Internal validation 
No representation of computations 
2013. Include data about computations 
Goal: External agents can verify data & computations 
2014. Currently in development
Webindex workflow 
Data 
(Excel) 
RDF 
Datastore 
Visualizations 
Linked data portal 
Conversion 
Excel  RDF 
Computation 
Enrichment 
SPARQL endpoint 
http://data.webfoundation.org/sparql
Computation process (1) 
Simplified with one indicator, 3 years and 4 countries 
Raw Data (Indicator A) 
Country 2009 2010 2011 
Spain 4 5 3 
Finland 4 6 
Armenia 1 
Country 2009 2010 2011 
Spain 4 5 3 
Country 2009 2010 2011 
Spain 4 5 3 
Finland 4 5 6 
Finland 4 5 6 
Armenia 1 1 1 
Armenia 1 1 1 
Chile 6 8 10.6 
Chile 6 8 10.6 
Impute Data 
Country 2009 2010 2011 
Country 2009 2010 2011 
Spain 4 5 3 
Spain 4 5 3 
Finland 4 5 6 
Finland 4 5 6 
Armenia 1 1 1 
Armenia 1 1 1 
Chile 6 8 10.6 
Chile 6 8 10.6 
Country 2009 2010 2011 
Spain 4 5 3 
Finland 4 6 
Armenia 1 
Chile 6 8 
Chile 6 8 
Country 2009 2010 2011 
Spain 4 5 3 
Finland 4 5 6 
Armenia 1 1 1 
Chile 6 8 10.6 
Filter Data 
Normalize Data (z-scores) 
Country 2009 2010 2011 
Country 2009 2010 2011 
Spain -0.57 -0.57 -0.92 
Spain -0.57 -0.57 -0.92 
Finland -0.57 -0.57 -0.14 
Finland -0.57 -0.57 -0.14 
Chile 1.15 1.15 1.06 
Country 2009 2010 2011 
Spain 4 5 3 
Finland 4 6 
Armenia 1 
Chile 6 8 
Country 2009 2010 2011 
Spain 4 5 3 
Finland 4 5 6 
Armenia 1 1 1 
Chile 6 8 10.6 
Country 2009 2010 2011 
Spain -0.57 -0.57 -0.92 
Finland -0.57 -0.57 -0.14 
Chile 1.15 1.15 1.06 
Chile 1.15 1.15 1.06 
Mean 
푥푖 = 
푥푖−1 + 푥푖+1 
2 
Average growth 
푥푛 ∙ 
푥푛−1 
푥푛−2 
+ ⋯ + 
푥2 
푥1 
푛 − 1 
z-score 
푧 = 
푥 − 휇 
휎 
More details can be found here: http://thewebindex.org/about/methodology/computation/
Computation Process (2) 
Simplified with one indicator, 3 years and 4 countries 
Normalize Data (z-scores) 
Country 2009 2010 2011 
Country 2009 2010 2011 
Country 2009 2010 2011 
Spain -0.57 -0.57 -0.92 
Spain -0.57 -0.57 -0.92 
Spain -0.57 -0.57 -0.92 
Finland -0.57 -0.57 -0.14 
Finland -0.57 -0.57 -0.14 
Finland -0.57 -0.57 -0.14 
Chile 1.15 1.15 1.06 
Chile 1.15 1.15 1.06 
Chile 1.15 1.15 1.06 
Adjust data 
Country A B C D ... 
Spain 8 7 9.1 7.1 ... 
Finland 7 8 7.1 8 ... 
Chile 8 9 7.6 6 ... 
Group indicators 
Country Readiness Impact Web Composite 
Spain 5.7 3.5 5.1 4.5 
Finland 5.5 3.9 7.1 4.9 
Chile 6.7 4.5 7.6 5.1 
Rankings 
푥푖 = 푥푖 + 훿 
Country Readiness Impact Web Composite 
Spain 2 3 3 3 
Finland 3 2 2 2 
Chile 1 1 1 1 
More details can be found here: http://thewebindex.org/about/methodology/computation/
WebIndex data model 
Observations have values by years 
Observations refer to indicators and countries 
ITU_B 2011 2012 2013 ... 
Germany 20.34 35.46 37.12 ... 
Spain 19.12 23.78 25.45 ... 
France 20.12 21.34 28.34 ... 
... ... ... ... ... 
ITU_B 2011 2012 2013 ... 
Germany 20.34 35.46 37.12 ... 
Spain 19.12 23.78 25.45 ... 
France 20.12 21.34 28.34 ... 
... ... ... ... ... 
ITU_B 2010 2011 2012 ... 
Germany 20.34 35.46 37.12 ... 
Spain 19.12 23.78 25.45 ... 
France 20.12 21.34 28.34 ... 
... ... ... ... ... 
Model based on RDF Data Cube 
Main entity = Observation 
DataSets are published by Organizations 
Datasets contain several slices 
Slices group observations 
Observation 
Indicator Years 
% Broadband subscribers 
Countries 
Slice 
DataSet 
Indicators are provided by Organizations 
Examples 
ITU = International Telecommunication Union 
UN = United Nations 
WB = World bank 
...
Data model* 
1..n 
qb:slice 
Observation 
qb:observation 
rdf:type = qb:Observation 
cex:value: xsd:float 
dc:issued: xsd:dateTime 
rdfs:label: xsd:String 
cex:ref-year: xsd:gYear 
Organization 
rdf:type = org:Organization 
rdfs:label: xsd:String 
foaf:homepage: IRI 
Indicator 
rdf:type = cex:Primary 
| cex:Secondary 
rdfs:label: xsd:string 
rdfs:comment: xsd:string 
skos:notation: xsd:String 
Slice 
rdf:type = qb:Slice 
qb:sliceStructure: wf:sliceByArea 
Country 
rdf:type = wf:Country 
wf:iso2 : xsd:string 
wf:iso3 : xsd:string 
rdfs:label : xsd:String 
DataSet 
rdf:type = qb:DataSet 
qb:structure : wf:DSD 
rdfs:label : xsd:String 
qb:observation 
cex:ref-area cex:indicator 
wf:provider 
dc:publisher 
1..n 
*Simplified 
cex:computation 
Computation 
rdf:type = cex:Computation
Excel  RDF (Turtle) 
indicator:ITU_B 
a wf:SecondaryIndicator ; 
rdfs:label "Broadband subscribers %" 
. 
dataset:DITU a qb:DataSet ; 
rdfs:label "ITU Dataset" ; 
dc:publisher org:ITU ; 
qb:slice slice:ITU10B , 
slice:ITU11B, 
. ... 
... 
slice:ITU11B a qb:Slice ; 
qb:sliceStructure wf:sliceByYear ; 
qb:observation obs:obs8165, 
obs:obs8166, 
... 
... 
org:ITU a org:Organization ; 
rdfs:label "ITU" ; 
foaf:homepage <http://www.itu.int/> 
. 
country:Spain a wf:Country ; 
wf:iso2 "ES" ; wf:iso3 "ESP" ; 
rdfs:label "Spain" 
. 
interrelated 
linked 
data 
obs:obs8165 a qb:Observation ; 
rdfs:label "ITU B in ESP, 2011" ; 
cex:indicator indicator:ITU_B ; 
qb:dataSet dataset:DITU ; 
cex:value "23.78"^^xsd:float ; 
cex:ref-year 2011 ; 
cex:ref-area country:Spain ; 
dc:issued "2013-05-30"^^xsd:date ; 
cex:computation cex:raw ; 
... 
.
Computation process 
1. First computation 
Statistics experts using Excel 
2. Second computation (WESO team) 
1st. approach: SPARQL Update queries 
Can reuse the validation queries 
Declarative approach 
Problem: Efficiency & debugging 
2nd. approach: Special purpose program 
Performs computations and adds metadata wiCompute 
http://www.github.com/weso/wiCompute
Computation representation 
Computex Vocabulary 
Describes statistical computation procedures 
Compatible with RDF Data Cube 
Some terms: 
cex:Concept Entities that are beind indexed 
cex:Indicator Dimension whose values add information to the 
index 
cex:Computation Represents the different computation types 
It can be: 
cex:Raw, cex:Mean, cex:Increment, cex:Copy, 
cex:Z-Score, cex:Ranking, cex:AverageGrowth, 
cex:WeightedMean 
cex:WeightSchema Weight schema for a list of indicators 
http://purl.org/weso/ontology/computex
Example of a computed 
observation 
obs:c39049 a qb:Observation ; 
rdfs:label "ITU B in ESP, 2011, Normalized" ; 
cex:indicator indicator:ITU_B ; 
qb:dataSet dataset:computed366 ; 
cex:value "0.859"^^xsd:double ; 
cex:ref-year 2011 ; 
cex:ref-area country:Spain ; 
cex:computation wi-comp:comp39050 ; 
... 
. 
푧 = 
Normalization using z-score 
푥 − 휇 
휎 
= 
23.78 − 12.816 
wi-comp:39050 a cex:Normalize ; 
cex:stdDesv "12.766"^^xsd:double ; 
cex:mean "12.816"^^xsd:double ; 
cex:slice wi-slice:sliceITUB_2011 ; 
cex:observation obs:obs8165 ; 
. 
obs:obs8165 a qb:Observation ; 
cex:value "23.78"^^xsd:double ; 
... 
. 
12.766 
= 0.859 
wi-slice:sliceITU_B_2011 a qb:Slice ; 
qb:observation obs:8471, 
obs:8434, ...; 
. 
ok! 
URI of computed observation: 
http://data.webfoundation.org/webindex/v2013/observation/computed_2011_1386752461095_39049
Verifying linked data contents 
Once the linked data has been published 
How can an external agent verify it? 
2 approaches: 
SPARQL Queries 
Shape expressions
SPARQL validation 
CONSTRUCT queries like: 
CONSTRUCT { 
[ a cex:Error ; cex:errorParam # ... omitted 
cex:msg "Observation has two different values" . ] 
} WHERE { 
?obs a qb:Observation . 
?obs cex:value ?value1 . 
?obs cex:value ?value2 . 
FILTER ( ?value1 != ?value2 ) 
} 
Detects if one observation has more than 1 value
SPARQL validation 
More advanced queries like: 
CONSTRUCT { 
[ a cex:Error ; cex:errorParam # ...omitted 
cex:msg "Mean value does not match" ] . 
} WHERE { 
?obs a qb:Observation ; 
cex:computation ?comp ; 
cex:value ?val . 
?comp a cex:Mean . 
{ SELECT (AVG(?value) as ?mean) ?comp WHERE { 
?comp cex:observation ?obs1 . 
?obs1 cex:value ?value ; 
} GROUP BY ?comp 
} 
FILTER (abs(?mean - ?val) > 0.0001) 
} Detects if an observation whose computation is 
declared as the mean is really the mean
Shape Expressions description 
Shape expressions declare the shape of RDF data 
Human readable and machine processable 
Shape Expressions for team communication 
Developers know which triples must generate/consume 
<Observation> { 
rdf:type (qb:Observation) 
, cex:value xsd:float ? 
, dc:issued xsd:dateTime 
, rdfs:label xsd:string ? 
, qb:dataSet @<DataSet> 
, cex:ref-area @<Country> 
, cex:indicator @<Indicator> 
, cex:ref-year xsd:gYear 
, cex:computation @<Computation> 
} 
Documentation http://weso.github.io/wiDoc
Visualization 
Visualization tool: Wesby, Inspired by Pubby 
Enables easy customization by templates 
Different templates are chosen based on rdf:type 
Data load on demand 
SPARQL queries 
Responsive design and mobile friendly
Visualization 
Example: Template for Observations 
http://data.webfoundation.org/webindex/v2013/observation/obs8003
Visualization 
Example: Template for Countries 
http://data.webfoundation.org/webindex/v2013/country/ESP
Conclusions 
WebIndex: 
Linked data portal (medium size ≈ 3,5 mill triples) 
It adds data about computation 
Computations represented as linked data 
We explored some possibilities for validation 
SPARQL validation: very expressive, declarative 
Shape Expressions: more readable 
Visualization by templates was fine 
Challenge to visualize bNodes
Future work & reflection 
Computex vocabulary was a first attempt 
Further work to employ it in similar projects 
Visualization of computations 
Define wesby templates to visualize computations 
Reflection: Was it worth the effort? 
We produced data that can be externally verified 
However... 
...we still don't have consumers who need it
End of presentation 
More info: 
WESO Research group 
http://www.weso.es

Contenu connexe

Similaire à Representing verifiable statistical index computations as linked data

Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Jose Emilio Labra Gayo
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applicationsJose Emilio Labra Gayo
 
Linked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech LondonLinked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech LondonDave Reynolds
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data HypercubesDave Reynolds
 
Validating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape ExpressionsValidating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape ExpressionsJose Emilio Labra Gayo
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledgeChristopher Williams
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEuropeBigData_Europe
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisYabebal Ayalew
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsSri Ambati
 
print mod 2.pdf
print mod 2.pdfprint mod 2.pdf
print mod 2.pdflathass5
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem ChemAxon
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Dr Sulaimon Afolabi
 
Discover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and DiscoveryDiscover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and DiscoveryNew Delhi Salesforce Developer Group
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Vincent Michel
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 

Similaire à Representing verifiable statistical index computations as linked data (20)

Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 
Linked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech LondonLinked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech London
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
 
Validating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape ExpressionsValidating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape Expressions
 
Sap business objects bobi training
Sap business objects bobi trainingSap business objects bobi training
Sap business objects bobi training
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
 
print mod 2.pdf
print mod 2.pdfprint mod 2.pdf
print mod 2.pdf
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
 
Olap
OlapOlap
Olap
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1
 
Discover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and DiscoveryDiscover deep insights with Salesforce Einstein Analytics and Discovery
Discover deep insights with Salesforce Einstein Analytics and Discovery
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 

Plus de Jose Emilio Labra Gayo

Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctoradoJose Emilio Labra Gayo
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapesJose Emilio Labra Gayo
 
Legislative data portals and linked data quality
Legislative data portals and linked data qualityLegislative data portals and linked data quality
Legislative data portals and linked data qualityJose Emilio Labra Gayo
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesJose Emilio Labra Gayo
 
Legislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologiesLegislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologiesJose Emilio Labra Gayo
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosJose Emilio Labra Gayo
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorJose Emilio Labra Gayo
 

Plus de Jose Emilio Labra Gayo (20)

Publicaciones de investigación
Publicaciones de investigaciónPublicaciones de investigación
Publicaciones de investigación
 
Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctorado
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
Legislative data portals and linked data quality
Legislative data portals and linked data qualityLegislative data portals and linked data quality
Legislative data portals and linked data quality
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Wikidata
WikidataWikidata
Wikidata
 
Legislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologiesLegislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologies
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introducción a la Web Semántica
Introducción a la Web SemánticaIntroducción a la Web Semántica
Introducción a la Web Semántica
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
2017 Tendencias en informática
2017 Tendencias en informática2017 Tendencias en informática
2017 Tendencias en informática
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
19 javascript servidor
19 javascript servidor19 javascript servidor
19 javascript servidor
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazados
 
16 Alternativas XML
16 Alternativas XML16 Alternativas XML
16 Alternativas XML
 
XSLT
XSLTXSLT
XSLT
 
XPath
XPathXPath
XPath
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el Servidor
 

Dernier

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Dernier (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Representing verifiable statistical index computations as linked data

  • 1. Representing Verifiable Statistical Computations as linked data Hania Farham The Web Foundation London hania@webfoundation.org Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain labra@uniovi.es Juan Castro Fernández WESO Research Group University of Oviedo Spain juan@weso.es Jose María Álvarez Rodríguez Dept. Computer Science Carlos III University Spain josemaria.alvarez@uc3m.es
  • 2. This talk in one slide Describe the WebIndex Project Represents an statistical index Data Model based Computation and validation process Visualization
  • 3. Web Index Measure WWW's contribution to development and human rights by country Developed by the Web Foundation Web page: http://thewebindex.org Linked data portal: http://data.webfoundation.org/webindex/2013
  • 4. Technical details Index made from 81 countries, 5 years (2007-12 116 indicators: 84 Primary (questionnaires) 32 Secondary (external sources) Linked data portal Modeled on top of RDF Data Cube Linked data: DBPedia, Organizations, etc.
  • 5. Different versions 2012. Visualizations & linked data portal RDF representation based on RDF Data Cube Internal validation No representation of computations 2013. Include data about computations Goal: External agents can verify data & computations 2014. Currently in development
  • 6. Webindex workflow Data (Excel) RDF Datastore Visualizations Linked data portal Conversion Excel  RDF Computation Enrichment SPARQL endpoint http://data.webfoundation.org/sparql
  • 7. Computation process (1) Simplified with one indicator, 3 years and 4 countries Raw Data (Indicator A) Country 2009 2010 2011 Spain 4 5 3 Finland 4 6 Armenia 1 Country 2009 2010 2011 Spain 4 5 3 Country 2009 2010 2011 Spain 4 5 3 Finland 4 5 6 Finland 4 5 6 Armenia 1 1 1 Armenia 1 1 1 Chile 6 8 10.6 Chile 6 8 10.6 Impute Data Country 2009 2010 2011 Country 2009 2010 2011 Spain 4 5 3 Spain 4 5 3 Finland 4 5 6 Finland 4 5 6 Armenia 1 1 1 Armenia 1 1 1 Chile 6 8 10.6 Chile 6 8 10.6 Country 2009 2010 2011 Spain 4 5 3 Finland 4 6 Armenia 1 Chile 6 8 Chile 6 8 Country 2009 2010 2011 Spain 4 5 3 Finland 4 5 6 Armenia 1 1 1 Chile 6 8 10.6 Filter Data Normalize Data (z-scores) Country 2009 2010 2011 Country 2009 2010 2011 Spain -0.57 -0.57 -0.92 Spain -0.57 -0.57 -0.92 Finland -0.57 -0.57 -0.14 Finland -0.57 -0.57 -0.14 Chile 1.15 1.15 1.06 Country 2009 2010 2011 Spain 4 5 3 Finland 4 6 Armenia 1 Chile 6 8 Country 2009 2010 2011 Spain 4 5 3 Finland 4 5 6 Armenia 1 1 1 Chile 6 8 10.6 Country 2009 2010 2011 Spain -0.57 -0.57 -0.92 Finland -0.57 -0.57 -0.14 Chile 1.15 1.15 1.06 Chile 1.15 1.15 1.06 Mean 푥푖 = 푥푖−1 + 푥푖+1 2 Average growth 푥푛 ∙ 푥푛−1 푥푛−2 + ⋯ + 푥2 푥1 푛 − 1 z-score 푧 = 푥 − 휇 휎 More details can be found here: http://thewebindex.org/about/methodology/computation/
  • 8. Computation Process (2) Simplified with one indicator, 3 years and 4 countries Normalize Data (z-scores) Country 2009 2010 2011 Country 2009 2010 2011 Country 2009 2010 2011 Spain -0.57 -0.57 -0.92 Spain -0.57 -0.57 -0.92 Spain -0.57 -0.57 -0.92 Finland -0.57 -0.57 -0.14 Finland -0.57 -0.57 -0.14 Finland -0.57 -0.57 -0.14 Chile 1.15 1.15 1.06 Chile 1.15 1.15 1.06 Chile 1.15 1.15 1.06 Adjust data Country A B C D ... Spain 8 7 9.1 7.1 ... Finland 7 8 7.1 8 ... Chile 8 9 7.6 6 ... Group indicators Country Readiness Impact Web Composite Spain 5.7 3.5 5.1 4.5 Finland 5.5 3.9 7.1 4.9 Chile 6.7 4.5 7.6 5.1 Rankings 푥푖 = 푥푖 + 훿 Country Readiness Impact Web Composite Spain 2 3 3 3 Finland 3 2 2 2 Chile 1 1 1 1 More details can be found here: http://thewebindex.org/about/methodology/computation/
  • 9. WebIndex data model Observations have values by years Observations refer to indicators and countries ITU_B 2011 2012 2013 ... Germany 20.34 35.46 37.12 ... Spain 19.12 23.78 25.45 ... France 20.12 21.34 28.34 ... ... ... ... ... ... ITU_B 2011 2012 2013 ... Germany 20.34 35.46 37.12 ... Spain 19.12 23.78 25.45 ... France 20.12 21.34 28.34 ... ... ... ... ... ... ITU_B 2010 2011 2012 ... Germany 20.34 35.46 37.12 ... Spain 19.12 23.78 25.45 ... France 20.12 21.34 28.34 ... ... ... ... ... ... Model based on RDF Data Cube Main entity = Observation DataSets are published by Organizations Datasets contain several slices Slices group observations Observation Indicator Years % Broadband subscribers Countries Slice DataSet Indicators are provided by Organizations Examples ITU = International Telecommunication Union UN = United Nations WB = World bank ...
  • 10. Data model* 1..n qb:slice Observation qb:observation rdf:type = qb:Observation cex:value: xsd:float dc:issued: xsd:dateTime rdfs:label: xsd:String cex:ref-year: xsd:gYear Organization rdf:type = org:Organization rdfs:label: xsd:String foaf:homepage: IRI Indicator rdf:type = cex:Primary | cex:Secondary rdfs:label: xsd:string rdfs:comment: xsd:string skos:notation: xsd:String Slice rdf:type = qb:Slice qb:sliceStructure: wf:sliceByArea Country rdf:type = wf:Country wf:iso2 : xsd:string wf:iso3 : xsd:string rdfs:label : xsd:String DataSet rdf:type = qb:DataSet qb:structure : wf:DSD rdfs:label : xsd:String qb:observation cex:ref-area cex:indicator wf:provider dc:publisher 1..n *Simplified cex:computation Computation rdf:type = cex:Computation
  • 11. Excel  RDF (Turtle) indicator:ITU_B a wf:SecondaryIndicator ; rdfs:label "Broadband subscribers %" . dataset:DITU a qb:DataSet ; rdfs:label "ITU Dataset" ; dc:publisher org:ITU ; qb:slice slice:ITU10B , slice:ITU11B, . ... ... slice:ITU11B a qb:Slice ; qb:sliceStructure wf:sliceByYear ; qb:observation obs:obs8165, obs:obs8166, ... ... org:ITU a org:Organization ; rdfs:label "ITU" ; foaf:homepage <http://www.itu.int/> . country:Spain a wf:Country ; wf:iso2 "ES" ; wf:iso3 "ESP" ; rdfs:label "Spain" . interrelated linked data obs:obs8165 a qb:Observation ; rdfs:label "ITU B in ESP, 2011" ; cex:indicator indicator:ITU_B ; qb:dataSet dataset:DITU ; cex:value "23.78"^^xsd:float ; cex:ref-year 2011 ; cex:ref-area country:Spain ; dc:issued "2013-05-30"^^xsd:date ; cex:computation cex:raw ; ... .
  • 12. Computation process 1. First computation Statistics experts using Excel 2. Second computation (WESO team) 1st. approach: SPARQL Update queries Can reuse the validation queries Declarative approach Problem: Efficiency & debugging 2nd. approach: Special purpose program Performs computations and adds metadata wiCompute http://www.github.com/weso/wiCompute
  • 13. Computation representation Computex Vocabulary Describes statistical computation procedures Compatible with RDF Data Cube Some terms: cex:Concept Entities that are beind indexed cex:Indicator Dimension whose values add information to the index cex:Computation Represents the different computation types It can be: cex:Raw, cex:Mean, cex:Increment, cex:Copy, cex:Z-Score, cex:Ranking, cex:AverageGrowth, cex:WeightedMean cex:WeightSchema Weight schema for a list of indicators http://purl.org/weso/ontology/computex
  • 14. Example of a computed observation obs:c39049 a qb:Observation ; rdfs:label "ITU B in ESP, 2011, Normalized" ; cex:indicator indicator:ITU_B ; qb:dataSet dataset:computed366 ; cex:value "0.859"^^xsd:double ; cex:ref-year 2011 ; cex:ref-area country:Spain ; cex:computation wi-comp:comp39050 ; ... . 푧 = Normalization using z-score 푥 − 휇 휎 = 23.78 − 12.816 wi-comp:39050 a cex:Normalize ; cex:stdDesv "12.766"^^xsd:double ; cex:mean "12.816"^^xsd:double ; cex:slice wi-slice:sliceITUB_2011 ; cex:observation obs:obs8165 ; . obs:obs8165 a qb:Observation ; cex:value "23.78"^^xsd:double ; ... . 12.766 = 0.859 wi-slice:sliceITU_B_2011 a qb:Slice ; qb:observation obs:8471, obs:8434, ...; . ok! URI of computed observation: http://data.webfoundation.org/webindex/v2013/observation/computed_2011_1386752461095_39049
  • 15. Verifying linked data contents Once the linked data has been published How can an external agent verify it? 2 approaches: SPARQL Queries Shape expressions
  • 16. SPARQL validation CONSTRUCT queries like: CONSTRUCT { [ a cex:Error ; cex:errorParam # ... omitted cex:msg "Observation has two different values" . ] } WHERE { ?obs a qb:Observation . ?obs cex:value ?value1 . ?obs cex:value ?value2 . FILTER ( ?value1 != ?value2 ) } Detects if one observation has more than 1 value
  • 17. SPARQL validation More advanced queries like: CONSTRUCT { [ a cex:Error ; cex:errorParam # ...omitted cex:msg "Mean value does not match" ] . } WHERE { ?obs a qb:Observation ; cex:computation ?comp ; cex:value ?val . ?comp a cex:Mean . { SELECT (AVG(?value) as ?mean) ?comp WHERE { ?comp cex:observation ?obs1 . ?obs1 cex:value ?value ; } GROUP BY ?comp } FILTER (abs(?mean - ?val) > 0.0001) } Detects if an observation whose computation is declared as the mean is really the mean
  • 18. Shape Expressions description Shape expressions declare the shape of RDF data Human readable and machine processable Shape Expressions for team communication Developers know which triples must generate/consume <Observation> { rdf:type (qb:Observation) , cex:value xsd:float ? , dc:issued xsd:dateTime , rdfs:label xsd:string ? , qb:dataSet @<DataSet> , cex:ref-area @<Country> , cex:indicator @<Indicator> , cex:ref-year xsd:gYear , cex:computation @<Computation> } Documentation http://weso.github.io/wiDoc
  • 19. Visualization Visualization tool: Wesby, Inspired by Pubby Enables easy customization by templates Different templates are chosen based on rdf:type Data load on demand SPARQL queries Responsive design and mobile friendly
  • 20. Visualization Example: Template for Observations http://data.webfoundation.org/webindex/v2013/observation/obs8003
  • 21. Visualization Example: Template for Countries http://data.webfoundation.org/webindex/v2013/country/ESP
  • 22. Conclusions WebIndex: Linked data portal (medium size ≈ 3,5 mill triples) It adds data about computation Computations represented as linked data We explored some possibilities for validation SPARQL validation: very expressive, declarative Shape Expressions: more readable Visualization by templates was fine Challenge to visualize bNodes
  • 23. Future work & reflection Computex vocabulary was a first attempt Further work to employ it in similar projects Visualization of computations Define wesby templates to visualize computations Reflection: Was it worth the effort? We produced data that can be externally verified However... ...we still don't have consumers who need it
  • 24. End of presentation More info: WESO Research group http://www.weso.es