SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
!
!
Big Data Standards: how to set the bar?!
!
!
Susanna-Assunta Sansone, PhD!
!
@biosharing!
@isatools!
!
Experimental Biology, Big Data Workshop, 28 March, 2015
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
http://www.slideshare.net/SusannaSansone
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/
Credit to:
A community mobilization for “openness”
Is open data understandable, reusable?
“Reproducing the method took several
months of effort, and required using new
versions and new software that posed
challenges to reconstructing and validating
the results”
Is open data understandable, reusable?
Not always…but why?
•  Outputs are multi-dimensional, diverse, not always well cited / stored
•  Software, codes, workflows etc.; hard(er) to get hold of
•  Data often distributed and fragmented to fit (siloed) databases
o  Not contain enough information for others to understand it
•  Uneven level of details and annotation across different databases
o  Specialized, generalist, public and institutional
•  Data curation activities are perceived as time consuming
o  Collection and harmonization of detailed methods and experimental
steps is done/rushed at publication stage
Not just open, but FAIR data
Responsibilities lie across several stakeholder groups
Understand the benefits of sharing
FAIR datasets and enact them
Engage and assist researchers to enable
them to share FAIR datasets
Release or endorse practices
and polices, but also incentive
and credit mechanisms for
researchers, curators and
developers
Rise of a data-centric enterprise, e.g.:
Not just data, but FAIR digital research objects
•  We need to report sufficient
information to reuse the dataset
•  We must strike a balance between
depth and breadth of information
Without context data is meaningless
Information intensive experiments
•  Not too much
•  Not too little
•  But just right
And conversely….
LS1_C2_LD_TP2_P1! file1.gz!
…how not to report the experimental information!
•  L!S1 ! !liver sample 1!
•  C2 ! !compound 2!
•  LD ! !low dose!
•  TP2 ! !time point 2!
•  P1 ! !protocol 1!
•  file1.gz! !compressed data file with !
! ! !phenotypic and other information
! ! !on this sample!
Sample name (?!)! Data file!
LS1_C2_LD_TP2_P1! file1.gz!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
1
4
•  make annotation explicit
and discoverable
•  structure the descriptions for
consistency
•  ensure/regulate access
•  deposit and publish
•  etc….
•  To make any dataset ‘FAIR’, one
must have standards, tools and
best practices to:
§  report sufficient details
§  capture all salient features of
the experimental workflow
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
1
5
…breadth and depth !
of the experimental context!
…is pivotal
!
…and has to be both
human and machine
readable!
nature.com/scientificdata
A new category of publication that provides detailed descriptors of scientifically valuable
datasets. They are a highly effective link between traditional research articles and data repositories
Introducing the
Data Descriptor
Research
papers
Data
records
Data
Descriptors
To add value to research articles and data records
!
!
!
Experimental metadata or!
structured component!
(in-house curated, machine-
readable format)!
Article or !
narrative component!
(PDF and HTML)!
Data Description narrative and structured components
A curated, structured component - why?
•  Supplements the scientific discourse!
o  natural language has a degree of ambiguity!
•  Brings clarity in reporting research methods and procedures!
o  no trimming, no cooking!
o  clear samples to data files links and relation to methods!
•  Provides the basis for search and discovery features!
SciData DD
Structured
content SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
Same tissue
Same organism
Same assay
Community
Data
Repositories
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared…
From natural language to ‘computable’ concepts
Data Curation Editor
Responsible for creating the structured
component, ensuring that the most
appropriate metadata is being captured.
Age value
Unit
Strain name
Subject of the experiment
Type of diet and
experimental condition
Anatomy part
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared …
From natural language to ‘computable’ concepts
Age value
Unit
Strain name
Subject of the experiment
Type of diet and
experimental condition
Anatomy part
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared …
From natural language to ‘computable’ concepts
Type of protocol – cell preparation
Type of protocol - sample treatment
Type of protocol – liver preparation
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
Community-developed content standards
To structure and enrich the description of datasets, facilitating
understanding, sharing and reuse!
de jure de facto
grass-roots
groups
standard
organizations
Community mobilization, some examples
•  Structural and operational differences
§  organization types (open, close to members, society, WG etc.)
§  standards development (how to formulate, conduct and maintain)
§  adoption, uptake, outreach (link to journals, funders and commercial sector)
§  funds (sponsors, memberships, grants, volunteering)
~ 156
~ 70
~ 334
miame!
MIAPA!
MIRIAM!
MIQAS!
MIX!
MIGEN!
ARRIVE!
MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!
GCDML!
SRAxml!
SOFT!
FASTA!
DICOM!
MzML!
SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!
CHEBI!
OBI!
PATO! ENVO!
MOD!
BTO!
IDO…!
TEDDY!
PRO!
XAO!
DO
VO!
In the life sciences…..almost 600!
Databases, !
annotation,!
curation !
tools !
implementing !
standards!
A web-based, curated and searchable registry ensuring that
standards are registered, informative and discoverable; monitoring their
development and evolution and their use in databases,
and the adoption of both in data policies.
Launched Jan 2011
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
Core functionalities:
•  search and filtering, e.g. by
funder
•  submissions forms to add
new records
•  “claim” functionality of
existing records
•  person’s profile (as
maintainer of records)
associated to the ORCID
profile (for credit, as
incentive)
•  visualization and views of
content
Search, filter, claim, view and more
Assists users to make informed decisions
Advisory Board and Working Group - core members and adopters
Operational Team
The relationship among
popular standard formats
for pathway information. !
Demir, et al., The BioPAX
community standard for
pathway data sharing,
Nat Biotech. 2010.
Standards as an area of research - still a lot to do! E.g.:
1. Create relation or “usage maps and guides”, e.g.:
2. Metrics of maturity, usability and popularity
3. Embed in the ecosystem of complementary registries
31
Technologically-delineated
views of the world

!
Biologically-delineated
views of the world!
Generic features ( common core )!
- description of source biomaterial!
- experimental design components!
Arrays!
Scanning! Arrays &

Scanning!
Columns!
Gels!
MS! MS!
FTIR!
NMR!
Columns!
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
To compare and integrate data we need interoperable standards
How do we address fragmentation, duplications gaps?
Global alliances are needed, e.g.:
biocaddie.org
metadatacenter.org
•  Most researchers
understand the value of
standardized descriptions,
when using third-party
datasets!
!
•  But when asked to structure
their datasets, they view
requests for even “minimal”
information as burdensome!
re is an urgent need to lower
the bar for authoring good
metadata!
Researchers hate standards!
•  Most researchers
understand the value of
standardized descriptions,
when using third-party
datasets!
!
•  But when asked to structure
their datasets, they view
requests for even “minimal”
information as burdensome!
!
Ø  There is an urgent need to
lower the bar for authoring
good metadata!
Researchers hate standards!

Contenu connexe

Tendances

The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web servicesTrish Whetzel
 
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014Susanna-Assunta Sansone
 
NCBO Overview and Biositemaps
NCBO Overview and BiositemapsNCBO Overview and Biositemaps
NCBO Overview and BiositemapsTrish Whetzel
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsTim Clark
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platformTim Clark
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsMonica Munoz-Torres
 
Biomedical Resource Ontology
Biomedical Resource OntologyBiomedical Resource Ontology
Biomedical Resource OntologyTrish Whetzel
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentsTim Clark
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsTim Clark
 
Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO) Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO) Jie Bao
 

Tendances (20)

The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
 
NCBO Overview and Biositemaps
NCBO Overview and BiositemapsNCBO Overview and Biositemaps
NCBO Overview and Biositemaps
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology views
 
Ngsp
NgspNgsp
Ngsp
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platform
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation Tools
 
Biomedical Resource Ontology
Biomedical Resource OntologyBiomedical Resource Ontology
Biomedical Resource Ontology
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
 
Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO) Developing Frameworks and Tools for Animal Trait Ontology (ATO)
Developing Frameworks and Tools for Animal Trait Ontology (ATO)
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 

En vedette

mini MAXI art exhibition
mini MAXI art exhibitionmini MAXI art exhibition
mini MAXI art exhibitionAnna Casey
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQLMike Crabb
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!Boris Otto
 

En vedette (8)

mini MAXI art exhibition
mini MAXI art exhibitionmini MAXI art exhibition
mini MAXI art exhibition
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQL
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!
 

Similaire à Big Data Standards - Workshop, ExpBio, Boston, 2015

Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introductionMIBBI Checklists
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone
 
Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.Susanna-Assunta Sansone
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014Susanna-Assunta Sansone
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOAlejandra Gonzalez-Beltran
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...Peter McQuilton
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Overview of standards/stakeholders in life science (RDA Engagement Interest G...
Overview of standards/stakeholders in life science (RDA Engagement Interest G...Overview of standards/stakeholders in life science (RDA Engagement Interest G...
Overview of standards/stakeholders in life science (RDA Engagement Interest G...Susanna-Assunta Sansone
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014Susanna-Assunta Sansone
 

Similaire à Big Data Standards - Workshop, ExpBio, Boston, 2015 (20)

ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introduction
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Overview of standards/stakeholders in life science (RDA Engagement Interest G...
Overview of standards/stakeholders in life science (RDA Engagement Interest G...Overview of standards/stakeholders in life science (RDA Engagement Interest G...
Overview of standards/stakeholders in life science (RDA Engagement Interest G...
 
Life science odin-oct2013-sa-sansone
Life science odin-oct2013-sa-sansoneLife science odin-oct2013-sa-sansone
Life science odin-oct2013-sa-sansone
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
 

Plus de Susanna-Assunta Sansone

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRSusanna-Assunta Sansone
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookSusanna-Assunta Sansone
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessSusanna-Assunta Sansone
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features Susanna-Assunta Sansone
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseSusanna-Assunta Sansone
 

Plus de Susanna-Assunta Sansone (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
FAIRsharing: what we do for policies
FAIRsharing: what we do for policiesFAIRsharing: what we do for policies
FAIRsharing: what we do for policies
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 

Dernier

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Dernier (20)

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Big Data Standards - Workshop, ExpBio, Boston, 2015

  • 1. ! ! Big Data Standards: how to set the bar?! ! ! Susanna-Assunta Sansone, PhD! ! @biosharing! @isatools! ! Experimental Biology, Big Data Workshop, 28 March, 2015 Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator http://www.slideshare.net/SusannaSansone
  • 3. A community mobilization for “openness”
  • 4. Is open data understandable, reusable? “Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results”
  • 5. Is open data understandable, reusable? Not always…but why? •  Outputs are multi-dimensional, diverse, not always well cited / stored •  Software, codes, workflows etc.; hard(er) to get hold of •  Data often distributed and fragmented to fit (siloed) databases o  Not contain enough information for others to understand it •  Uneven level of details and annotation across different databases o  Specialized, generalist, public and institutional •  Data curation activities are perceived as time consuming o  Collection and harmonization of detailed methods and experimental steps is done/rushed at publication stage
  • 6. Not just open, but FAIR data
  • 7. Responsibilities lie across several stakeholder groups Understand the benefits of sharing FAIR datasets and enact them Engage and assist researchers to enable them to share FAIR datasets Release or endorse practices and polices, but also incentive and credit mechanisms for researchers, curators and developers
  • 8. Rise of a data-centric enterprise, e.g.:
  • 9. Not just data, but FAIR digital research objects
  • 10. •  We need to report sufficient information to reuse the dataset •  We must strike a balance between depth and breadth of information Without context data is meaningless
  • 11. Information intensive experiments •  Not too much •  Not too little •  But just right
  • 13. …how not to report the experimental information! •  L!S1 ! !liver sample 1! •  C2 ! !compound 2! •  LD ! !low dose! •  TP2 ! !time point 2! •  P1 ! !protocol 1! •  file1.gz! !compressed data file with ! ! ! !phenotypic and other information ! ! !on this sample! Sample name (?!)! Data file! LS1_C2_LD_TP2_P1! file1.gz!
  • 14. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 1 4 •  make annotation explicit and discoverable •  structure the descriptions for consistency •  ensure/regulate access •  deposit and publish •  etc…. •  To make any dataset ‘FAIR’, one must have standards, tools and best practices to: §  report sufficient details §  capture all salient features of the experimental workflow
  • 15. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 1 5 …breadth and depth ! of the experimental context! …is pivotal ! …and has to be both human and machine readable!
  • 16. nature.com/scientificdata A new category of publication that provides detailed descriptors of scientifically valuable datasets. They are a highly effective link between traditional research articles and data repositories Introducing the Data Descriptor
  • 17. Research papers Data records Data Descriptors To add value to research articles and data records
  • 18. ! ! ! Experimental metadata or! structured component! (in-house curated, machine- readable format)! Article or ! narrative component! (PDF and HTML)! Data Description narrative and structured components
  • 19. A curated, structured component - why? •  Supplements the scientific discourse! o  natural language has a degree of ambiguity! •  Brings clarity in reporting research methods and procedures! o  no trimming, no cooking! o  clear samples to data files links and relation to methods! •  Provides the basis for search and discovery features! SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content SciData DD Structured content Same tissue Same organism Same assay Community Data Repositories
  • 20. Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared… From natural language to ‘computable’ concepts Data Curation Editor Responsible for creating the structured component, ensuring that the most appropriate metadata is being captured.
  • 21. Age value Unit Strain name Subject of the experiment Type of diet and experimental condition Anatomy part Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared … From natural language to ‘computable’ concepts
  • 22. Age value Unit Strain name Subject of the experiment Type of diet and experimental condition Anatomy part Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, hepatocytes prepared … From natural language to ‘computable’ concepts Type of protocol – cell preparation Type of protocol - sample treatment Type of protocol – liver preparation
  • 23. Including minimum information reporting requirements, or checklists to report the same core, essential information Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’ Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another Community-developed content standards To structure and enrich the description of datasets, facilitating understanding, sharing and reuse!
  • 24. de jure de facto grass-roots groups standard organizations Community mobilization, some examples •  Structural and operational differences §  organization types (open, close to members, society, WG etc.) §  standards development (how to formulate, conduct and maintain) §  adoption, uptake, outreach (link to journals, funders and commercial sector) §  funds (sponsors, memberships, grants, volunteering)
  • 25. ~ 156 ~ 70 ~ 334 miame! MIAPA! MIRIAM! MIQAS! MIX! MIGEN! ARRIVE! MIAPE! MIASE! MIQE! MISFISHIE….! REMARK! CONSORT! MAGE-Tab! GCDML! SRAxml! SOFT! FASTA! DICOM! MzML! SBRML! SEDML…! GELML! ISA-Tab! CML! MITAB! AAO! CHEBI! OBI! PATO! ENVO! MOD! BTO! IDO…! TEDDY! PRO! XAO! DO VO! In the life sciences…..almost 600! Databases, ! annotation,! curation ! tools ! implementing ! standards!
  • 26. A web-based, curated and searchable registry ensuring that standards are registered, informative and discoverable; monitoring their development and evolution and their use in databases, and the adoption of both in data policies. Launched Jan 2011
  • 27. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project Core functionalities: •  search and filtering, e.g. by funder •  submissions forms to add new records •  “claim” functionality of existing records •  person’s profile (as maintainer of records) associated to the ORCID profile (for credit, as incentive) •  visualization and views of content Search, filter, claim, view and more
  • 28. Assists users to make informed decisions
  • 29. Advisory Board and Working Group - core members and adopters Operational Team
  • 30. The relationship among popular standard formats for pathway information. ! Demir, et al., The BioPAX community standard for pathway data sharing, Nat Biotech. 2010. Standards as an area of research - still a lot to do! E.g.: 1. Create relation or “usage maps and guides”, e.g.: 2. Metrics of maturity, usability and popularity 3. Embed in the ecosystem of complementary registries
  • 31. 31 Technologically-delineated views of the world
 ! Biologically-delineated views of the world! Generic features ( common core )! - description of source biomaterial! - experimental design components! Arrays! Scanning! Arrays &
 Scanning! Columns! Gels! MS! MS! FTIR! NMR! Columns! transcriptomics proteomics metabolomics plant biology epidemiology microbiology To compare and integrate data we need interoperable standards How do we address fragmentation, duplications gaps?
  • 32. Global alliances are needed, e.g.:
  • 35. •  Most researchers understand the value of standardized descriptions, when using third-party datasets! ! •  But when asked to structure their datasets, they view requests for even “minimal” information as burdensome! re is an urgent need to lower the bar for authoring good metadata! Researchers hate standards!
  • 36. •  Most researchers understand the value of standardized descriptions, when using third-party datasets! ! •  But when asked to structure their datasets, they view requests for even “minimal” information as burdensome! ! Ø  There is an urgent need to lower the bar for authoring good metadata! Researchers hate standards!