SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Open Data Convergence
WHITE PAPER

International Institute of Information Technology, Bangalore
Table of Contents

Contents
Abstract _____________________________________________________________ 1
Introduction __________________________________________________________ 2
Problem Definition_____________________________________________________ 3
Solution _____________________________________________________________ 4
Benefits _____________________________________________________________ 7
Conclusion __________________________________________________________ 8
Contact Information____________________________________________________ 9
Open Data Convergence | White Paper

Pg. 01

Abstract
Open Data initiative by data.gov.in has provide a great platform for sharing the
datasets belonging to 55+ Departments consist of around 5000 datasets till date.
These datasets are available in various format such as excel, xml, csv, etc.
But the consumer of these datasets are facing few major issue, first one is Data
Integration, currently there is no mean to check what all dataset are linked with each
other semantically. Second issue is Data Quality, there is no uniform data
representation format, no quality check metric to show missing/invalid data.
In this white paper we are proposing a concept termed as ‘Data Convergence’ which
provides unified view of data collected from different sources (or departments) in
different formats. To demonstrate this we built a software application which will
provide unified view by means of HTTP API in JSON/XML formats.
Main objective is to maintain loose coupling between underline storage structure and
consumer client.
Open Data Convergence | White Paper

Pg. 02

“Make each bit
count – by
creating network
of datasets”

Introduction
Major shortcoming in most of open data portal (such as http://data.gov.in ) is that they
do not provide any API to their datasets. Some of portal like http://data.worldbank.org/,
http://data.gov provide API but they are limited to produce output for a particular
dataset only, yes they do have custom API builder but no way to merge/combine two
or more different dataset which are having one or more common attribute/dimension.
Just having a lot of data is useless unless we can derive some useful information from
it. For example from the current representation of open data, It is not obvious to
identify effect of weather on number of tourist visited as both of this dataset belongs to
two different department earth science and tourism respectively. And consumer may
not be aware of existence of dependent dataset which could have worked as catalyst
in his/her analysis. It would have been much better to have an API which will identify
how datasets are connected / linked on which attribute/dimension and produce a
unified view of multiple datasets.
Linked Dataset is a basement of complete Data Convergence system which consists
of large number of graphs where a node represents dataset and edge between them
denotes the existence of relationship. Linked Data helps to identify correlation among
different parameters in different dataset.
Common flaws such as different file format, lack of consistent representation, missing/
invalid values can also be solved to certain extent in preprocessing phase of data
convergence.
Pg. 03

Open Data Convergence | White Paper

Problem Definition
In most of Open data portal consumer has to browse through a collection of datasets
and select one dataset at a time. User has no mean to know relationship/linking
among different datasets. Unless one goes through all dataset manually he/she won’t
get a clear idea about dependency, connectivity among datasets on certain parameter
which could have served as more aggregated information.
There should be provision where consumer can provide what all data he/she wants, in
which format as well as quantity and system should be able to identify how to process
this query and display relevant information.
There should also be a mechanism for searching a dataset not only by its name /
department, but also by its content / field / attribute / dimension.
System should be capable of displaying all linked datasets for a given dataset, where
user can visualize a graph of linked datasets, also it must be able to converge this
datasets and produce a unified view.
Open Data Convergence | White Paper

Pg. 04

“Leveraging open
data platform
with the help
of converged
datasets”

Solution
Overview
Data convergence system at a high level takes three inputs from consumer viz., what
all datasets user wants to converge? In which format? How many records per page?
Later it processes the input and generates API which will give unified view of
datasets.

Description
A typical use case in Data convergence system is stated below
1. Data convergence system has a Smart API builder which provides two kind of
user interface to select a collection of datasets.
1.1 Navigational
a. Step by step navigation to select datasets provided with filter such as
country, state, district, etc.,
b. User has to first select department, then master dataset and finally choose
as many to select all data.
1.2 Search based
a. User can search by data, dataset name, dataset description, and dataset
field/attribute/dimension name in open data repository.
b. Select any number of datasets which are displayed in query result.
2. From a given collection of datasets system identifies connected components.
For example- Assume user selects dataset of Authorize Travel agency, tourism
statistics, rainfall, storm and temperature. Then two connected component are
identified viz.,{[ Authorize Travel agency, tourism statistics], [rainfall, storm,
Pg. 05

Open Data Convergence | White Paper
temperature]} depending upon what relationship is specified by dataset
producer while uploading into open data repository
3. API is generated for each connected components which provide flexibility to
modify number of records per page, offset and output format json/xml
json/xml.
Sample API:
http://opendata.com/api.jsp?key=fbd7939d674997cdb4692d34de8633c4&
offset=1&limit=10& format=json
4. Whenever user perform GET on API using key our data convergence system
will retrieve metadata of API query create a temporary unified view and return
result set in JSON/XML format
5. JSON/XML are standard data exchange format, they can be easily integrated
into any application.
Data Convergence system also provides visualization of all connected dataset up to
3rd level (extensible Few more functionalities like filtering attributes, records in unified
extensible).
view / result set can be added.

(fig. Data Convergence System Block Diagram)
Pg. 06

Open Data Convergence | White Paper
Following image show converged output of two different dataset Road Accident and
Person Injured for a state Arunachal Pradesh in two different format json and xml.

(fig. API response in JSON and XML format)
Open Data Convergence | White Paper

Pg. 07

Benefits
Data Convergence System will provide a single uniform HTTP API access to a unified
view of dataset. It also provides user a facility to visualize the network of connected
datasets.
Key Highlights
 Easy Access to converged data through API,
 Standard data exchange format (JSON and XML) are supported
 Flexible (select dimension as required)
 Unified view support real time data
 Loose coupling
 Notifications generated with change data capture (CDC)

(fig. Visualization of Datasets relationship)
Pg. 08

Open Data Convergence | White Paper

Conclusion
An attempt is been made to introduce a concept which will provide an unified view for
Datasets present in open data portal based on the concept of Data Convergence
which will leverage the way of accessing open data in this document.
Consumer of Open Data can now have a much more idea and knowledge of Datasets
and their relationships, eventually it will stimulate their data analysis process.
Open Data Convergence | White Paper

Pg. 09

Contact Information
Prof. Chandrashekhar Ramanathan
Tel: +91 80 4140 7777
rc@iiitb.ac.in

Bisen Vikrantsingh M.
Tel: +91 8792708719
Bisenvikrantsingh.mohansingh@iiitb.org

Institute
IIIT Bangalore
IIIT-Bangalore,26/C, Electronics
City, Hosur Road, Bangalore,
560100
Tel+91 80 4140 7777 / 2852 7627
http://www.iiitb.ac.in

Kodamasimham Pridhvi
Tel: +91 8123160887
Pridhvi.kodamasimham @iiitb.org

Contenu connexe

Tendances

Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Yadhu Kiran
 
Annotating Search Results from Web Databases
Annotating Search Results from Web DatabasesAnnotating Search Results from Web Databases
Annotating Search Results from Web DatabasesSWAMI06
 
Annotating search results from web databases
Annotating search results from web databasesAnnotating search results from web databases
Annotating search results from web databasesIEEEFINALYEARPROJECTS
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data modelAnilPokhrel7
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
 
Week 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental ConceptsWeek 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental Conceptsoudesign
 
The three level of data modeling
The three level of data modelingThe three level of data modeling
The three level of data modelingsharmila_yusof
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1JEAN-MICHEL LETENNIER
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbmsAnjaan Gajendra
 
Week 1 Lab Directions
Week 1 Lab DirectionsWeek 1 Lab Directions
Week 1 Lab Directionsoudesign
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
 
Identity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity AnalysisIdentity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity Analysisrahulmonikasharma
 

Tendances (18)

Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
 
Annotating Search Results from Web Databases
Annotating Search Results from Web DatabasesAnnotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
B131626
B131626B131626
B131626
 
Database model
Database modelDatabase model
Database model
 
Annotating search results from web databases
Annotating search results from web databasesAnnotating search results from web databases
Annotating search results from web databases
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data model
 
AtomiDB Dr Ashis Banerjee reviews
AtomiDB Dr Ashis Banerjee reviewsAtomiDB Dr Ashis Banerjee reviews
AtomiDB Dr Ashis Banerjee reviews
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
 
At33264269
At33264269At33264269
At33264269
 
Week 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental ConceptsWeek 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental Concepts
 
Ijetcas14 347
Ijetcas14 347Ijetcas14 347
Ijetcas14 347
 
The three level of data modeling
The three level of data modelingThe three level of data modeling
The three level of data modeling
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
Week 1 Lab Directions
Week 1 Lab DirectionsWeek 1 Lab Directions
Week 1 Lab Directions
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
 
Identity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity AnalysisIdentity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity Analysis
 

En vedette

C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpointdomenjoz
 
Pre-Assessment Material
Pre-Assessment MaterialPre-Assessment Material
Pre-Assessment Materialshakerra_davis
 
Adp l11 practice_template
Adp l11 practice_templateAdp l11 practice_template
Adp l11 practice_templateChiho Yoshida
 
On Target Recruitment Brochure
On Target Recruitment BrochureOn Target Recruitment Brochure
On Target Recruitment Brochureguymarshall
 
Gerencia de proyectos mapa conceptual
Gerencia de  proyectos mapa conceptualGerencia de  proyectos mapa conceptual
Gerencia de proyectos mapa conceptualMaryory Perea
 
Web applications tools for learning and teaching
Web applications tools for learning and teachingWeb applications tools for learning and teaching
Web applications tools for learning and teachingnorazilamat
 
C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpointdomenjoz
 
Kenya trip – horizons week
Kenya trip – horizons weekKenya trip – horizons week
Kenya trip – horizons week13websterkm2
 
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile LeanImbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile LeanAdrian Oprea
 
Implementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continuaImplementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continuaAdrian Oprea
 
Thermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureThermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureRa Jay
 
Teaching Writing through Genre Studies
Teaching Writing through Genre StudiesTeaching Writing through Genre Studies
Teaching Writing through Genre Studiessehclark
 
5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvementAdrian Oprea
 
Innovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBAInnovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBAValentín Muro
 
Comisión federal de electricidad
Comisión federal de electricidadComisión federal de electricidad
Comisión federal de electricidadEX ARTHUR MEXICO
 
ศิลปินในดวงใจ
ศิลปินในดวงใจศิลปินในดวงใจ
ศิลปินในดวงใจMinni Minnie
 
The woodlands home sales rpt january 2016
The woodlands home sales rpt   january 2016The woodlands home sales rpt   january 2016
The woodlands home sales rpt january 2016Deb Stirling
 

En vedette (20)

C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
 
Pre-Assessment Material
Pre-Assessment MaterialPre-Assessment Material
Pre-Assessment Material
 
Adp l11 practice_template
Adp l11 practice_templateAdp l11 practice_template
Adp l11 practice_template
 
On Target Recruitment Brochure
On Target Recruitment BrochureOn Target Recruitment Brochure
On Target Recruitment Brochure
 
Gerencia de proyectos mapa conceptual
Gerencia de  proyectos mapa conceptualGerencia de  proyectos mapa conceptual
Gerencia de proyectos mapa conceptual
 
Interview tips
Interview tips Interview tips
Interview tips
 
Web applications tools for learning and teaching
Web applications tools for learning and teachingWeb applications tools for learning and teaching
Web applications tools for learning and teaching
 
C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
 
Kenya trip – horizons week
Kenya trip – horizons weekKenya trip – horizons week
Kenya trip – horizons week
 
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile LeanImbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
 
Implementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continuaImplementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continua
 
Thermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureThermodynamics - Heat and Temperature
Thermodynamics - Heat and Temperature
 
Mawlid
MawlidMawlid
Mawlid
 
Teaching Writing through Genre Studies
Teaching Writing through Genre StudiesTeaching Writing through Genre Studies
Teaching Writing through Genre Studies
 
5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement
 
Innovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBAInnovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBA
 
Comisión federal de electricidad
Comisión federal de electricidadComisión federal de electricidad
Comisión federal de electricidad
 
NASA PDR Technical Report
NASA PDR Technical ReportNASA PDR Technical Report
NASA PDR Technical Report
 
ศิลปินในดวงใจ
ศิลปินในดวงใจศิลปินในดวงใจ
ศิลปินในดวงใจ
 
The woodlands home sales rpt january 2016
The woodlands home sales rpt   january 2016The woodlands home sales rpt   january 2016
The woodlands home sales rpt january 2016
 

Similaire à Data Convergence White Paper

Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data FusionIRJET Journal
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatchdata publica
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationKaashivInfoTech Company
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banksSammy Alvarez
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...rahulmonikasharma
 
Week 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data ModelsWeek 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data Modelsoudesign
 
Study on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsStudy on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsIJERA Editor
 
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big DataAuthentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big Datarahulmonikasharma
 
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record LinkageImplementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record LinkageIOSR Journals
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmIOSR Journals
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataeSAT Publishing House
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfJayanti Pande
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningIIRindia
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 

Similaire à Data Convergence White Paper (20)

Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatch
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
 
Week 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data ModelsWeek 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data Models
 
Study on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsStudy on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its Applications
 
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big DataAuthentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
 
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record LinkageImplementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
H017124652
H017124652H017124652
H017124652
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in data
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdf
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Chemread – a chemical informant
Chemread – a chemical informantChemread – a chemical informant
Chemread – a chemical informant
 
Cal Essay
Cal EssayCal Essay
Cal Essay
 

Dernier

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Dernier (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Data Convergence White Paper

  • 1. Open Data Convergence WHITE PAPER International Institute of Information Technology, Bangalore
  • 2. Table of Contents Contents Abstract _____________________________________________________________ 1 Introduction __________________________________________________________ 2 Problem Definition_____________________________________________________ 3 Solution _____________________________________________________________ 4 Benefits _____________________________________________________________ 7 Conclusion __________________________________________________________ 8 Contact Information____________________________________________________ 9
  • 3. Open Data Convergence | White Paper Pg. 01 Abstract Open Data initiative by data.gov.in has provide a great platform for sharing the datasets belonging to 55+ Departments consist of around 5000 datasets till date. These datasets are available in various format such as excel, xml, csv, etc. But the consumer of these datasets are facing few major issue, first one is Data Integration, currently there is no mean to check what all dataset are linked with each other semantically. Second issue is Data Quality, there is no uniform data representation format, no quality check metric to show missing/invalid data. In this white paper we are proposing a concept termed as ‘Data Convergence’ which provides unified view of data collected from different sources (or departments) in different formats. To demonstrate this we built a software application which will provide unified view by means of HTTP API in JSON/XML formats. Main objective is to maintain loose coupling between underline storage structure and consumer client.
  • 4. Open Data Convergence | White Paper Pg. 02 “Make each bit count – by creating network of datasets” Introduction Major shortcoming in most of open data portal (such as http://data.gov.in ) is that they do not provide any API to their datasets. Some of portal like http://data.worldbank.org/, http://data.gov provide API but they are limited to produce output for a particular dataset only, yes they do have custom API builder but no way to merge/combine two or more different dataset which are having one or more common attribute/dimension. Just having a lot of data is useless unless we can derive some useful information from it. For example from the current representation of open data, It is not obvious to identify effect of weather on number of tourist visited as both of this dataset belongs to two different department earth science and tourism respectively. And consumer may not be aware of existence of dependent dataset which could have worked as catalyst in his/her analysis. It would have been much better to have an API which will identify how datasets are connected / linked on which attribute/dimension and produce a unified view of multiple datasets. Linked Dataset is a basement of complete Data Convergence system which consists of large number of graphs where a node represents dataset and edge between them denotes the existence of relationship. Linked Data helps to identify correlation among different parameters in different dataset. Common flaws such as different file format, lack of consistent representation, missing/ invalid values can also be solved to certain extent in preprocessing phase of data convergence.
  • 5. Pg. 03 Open Data Convergence | White Paper Problem Definition In most of Open data portal consumer has to browse through a collection of datasets and select one dataset at a time. User has no mean to know relationship/linking among different datasets. Unless one goes through all dataset manually he/she won’t get a clear idea about dependency, connectivity among datasets on certain parameter which could have served as more aggregated information. There should be provision where consumer can provide what all data he/she wants, in which format as well as quantity and system should be able to identify how to process this query and display relevant information. There should also be a mechanism for searching a dataset not only by its name / department, but also by its content / field / attribute / dimension. System should be capable of displaying all linked datasets for a given dataset, where user can visualize a graph of linked datasets, also it must be able to converge this datasets and produce a unified view.
  • 6. Open Data Convergence | White Paper Pg. 04 “Leveraging open data platform with the help of converged datasets” Solution Overview Data convergence system at a high level takes three inputs from consumer viz., what all datasets user wants to converge? In which format? How many records per page? Later it processes the input and generates API which will give unified view of datasets. Description A typical use case in Data convergence system is stated below 1. Data convergence system has a Smart API builder which provides two kind of user interface to select a collection of datasets. 1.1 Navigational a. Step by step navigation to select datasets provided with filter such as country, state, district, etc., b. User has to first select department, then master dataset and finally choose as many to select all data. 1.2 Search based a. User can search by data, dataset name, dataset description, and dataset field/attribute/dimension name in open data repository. b. Select any number of datasets which are displayed in query result. 2. From a given collection of datasets system identifies connected components. For example- Assume user selects dataset of Authorize Travel agency, tourism statistics, rainfall, storm and temperature. Then two connected component are identified viz.,{[ Authorize Travel agency, tourism statistics], [rainfall, storm,
  • 7. Pg. 05 Open Data Convergence | White Paper temperature]} depending upon what relationship is specified by dataset producer while uploading into open data repository 3. API is generated for each connected components which provide flexibility to modify number of records per page, offset and output format json/xml json/xml. Sample API: http://opendata.com/api.jsp?key=fbd7939d674997cdb4692d34de8633c4& offset=1&limit=10& format=json 4. Whenever user perform GET on API using key our data convergence system will retrieve metadata of API query create a temporary unified view and return result set in JSON/XML format 5. JSON/XML are standard data exchange format, they can be easily integrated into any application. Data Convergence system also provides visualization of all connected dataset up to 3rd level (extensible Few more functionalities like filtering attributes, records in unified extensible). view / result set can be added. (fig. Data Convergence System Block Diagram)
  • 8. Pg. 06 Open Data Convergence | White Paper Following image show converged output of two different dataset Road Accident and Person Injured for a state Arunachal Pradesh in two different format json and xml. (fig. API response in JSON and XML format)
  • 9. Open Data Convergence | White Paper Pg. 07 Benefits Data Convergence System will provide a single uniform HTTP API access to a unified view of dataset. It also provides user a facility to visualize the network of connected datasets. Key Highlights  Easy Access to converged data through API,  Standard data exchange format (JSON and XML) are supported  Flexible (select dimension as required)  Unified view support real time data  Loose coupling  Notifications generated with change data capture (CDC) (fig. Visualization of Datasets relationship)
  • 10. Pg. 08 Open Data Convergence | White Paper Conclusion An attempt is been made to introduce a concept which will provide an unified view for Datasets present in open data portal based on the concept of Data Convergence which will leverage the way of accessing open data in this document. Consumer of Open Data can now have a much more idea and knowledge of Datasets and their relationships, eventually it will stimulate their data analysis process.
  • 11. Open Data Convergence | White Paper Pg. 09 Contact Information Prof. Chandrashekhar Ramanathan Tel: +91 80 4140 7777 rc@iiitb.ac.in Bisen Vikrantsingh M. Tel: +91 8792708719 Bisenvikrantsingh.mohansingh@iiitb.org Institute IIIT Bangalore IIIT-Bangalore,26/C, Electronics City, Hosur Road, Bangalore, 560100 Tel+91 80 4140 7777 / 2852 7627 http://www.iiitb.ac.in Kodamasimham Pridhvi Tel: +91 8123160887 Pridhvi.kodamasimham @iiitb.org