Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Topic 9- General Principles of International Law.pptx
Use of data
1. Keynote presented at the
Phenotype Foundation first annual
meeting.
Amsterdam, January 18, 2016
Prof. Chris Evelo
Department Bioinformatics –
BiGCaT
Maastricht University
@Chris_Evelo
The use and needs of data sharing in biology
3. Knowledge is hard to get
And it doesn’t even play it…
But you can gamify collection
Since we structure it, it can be easier to store
4. Sharing Data
I would like to exploit common genotype-phenotype relations
between Alzheimer’s Disease and Huntington’s Disease…
I need to combine AD and HD data…
I can help with
that!
I can help with
that!
Source: Marcos Roos
5. Who wants to share data?
• People who want to use data
• Funders
• Publishers
• But the researchers?
7. People hide data
• I did all this work I want to reuse
• They don’t need this part, might be my next…
• I might get a patent on this
• Or… It needs a patent to be valuable
• I can’t even patent because ...
8. How?
• Don’t add specifics
(ohh those really were knockout cells, but..)
• Leave out important steps
(I did these PCRs, why show the array)
• And “we used an approach slightly modified
from…”
• ...
10. Sharing Data
I would like to exploit common genotype-phenotype relations
between Alzheimer’s Disease and Huntington’s Disease…
I need to combine AD and HD data…
I can help with
that!
I can help with
that!
Source: Marcos Roos
12. Sharing Linkable Data
Source: Marcos Roos
I can go straight to answering my questions with data from
multiple data owners!
Patients will be so pleased with this speed-up!
Here’s my
Linked Data,
have fun!
Here’s my
Linked Data,
have fun!
13. Really?
From terms “liver, hepar, hepatic tissue”
To URI’s:
http://identifiers.org/tissueont1/liver
http://identifiers.org/tissueont2/hepar
….
Just a first step
14. And we didn’t even get that…
Reality:
Ontology inspired pull-down menu’s
15. Nothing is ever “same-as”
• We may need more meaningful predicates
• Or learn to use the better
• We need lenses, context matters
21. Discussed last Friday:
Serum and adipose tissue amino acid homeostasis in
the MHO (Badoud 2014)
– Objective: Integrate metabolite and gene expression profiling to elucidate the
molecular distinctions between Metabolically Healthy Obese (MHO) and
Metabolically Unhealthy Obese (MUO)
• Conclusion: SAT gene expression profiling revealed that genes related to branched-chain amino acid catabolism and the tricarboxylic
acid cycle were less down-regulated in MHO individuals compared to MUO individuals. Together, this integrated analysis revealed
that MHO individuals have an intermediate amino acid homeostasis compared to LH and MUO individuals.
– (Diabetes Risk Assessment study) 3 groups: Lean Healthy (LH), MHO and MUO
• Fasting serum samples from all participants and adipose tissue from the periumbilical region under local anesthesia after an
overnight fast
– Initially 30 participants, 10 in each group (7 women, 3 men), but for the Microarray
Analysis they analyzed SAT from 7 LH, 8 MHO and 8 MUO each group having 2 men.
Not very clear why->They selected samples having RNA integrity number higher than
8
– Gene expression data only for the 23 participants
– No gender or biological information (e.g glucose, total triglycerides, etc)
– Not initial serum metabolites concentration (only mean)
– dx.doi.org/10.1021/pr500416v
– Data can be found: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55200
22. Discussed last Friday:
Serum and adipose tissue amino acid homeostasis in
the MHO (Badoud 2014)
– Objective: Integrate metabolite and gene expression profiling to elucidate the
molecular distinctions between Metabolically Healthy Obese (MHO) and
Metabolically Unhealthy Obese (MUO)
• Conclusion: SAT gene expression profiling revealed that genes related to branched-chain amino acid catabolism and the tricarboxylic
acid cycle were less down-regulated in MHO individuals compared to MUO individuals. Together, this integrated analysis revealed
that MHO individuals have an intermediate amino acid homeostasis compared to LH and MUO individuals.
– (Diabetes Risk Assessment study) 3 groups: Lean Healthy (LH), MHO and MUO
• Fasting serum samples from all participants and adipose tissue from the periumbilical region under local anesthesia after an
overnight fast
– Initially 30 participants, 10 in each group (7 women, 3 men), but for the Microarray
Analysis they analyzed SAT from 7 LH, 8 MHO and 8 MUO each group having 2 men.
Not very clear why->They selected samples having RNA integrity number higher than
8
– Gene expression data only for the 23 participants
– No gender or biological information (e.g glucose, total triglycerides, etc)
– Not initial serum metabolites concentration (only mean)
– dx.doi.org/10.1021/pr500416v
– Data can be found: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55200
23. Adding phenotypic data
Diversity, not size, makes big data hard
SAM module
- small assays
- diverse assays
For now annotation, used after you find it
24. Repositories are technology driven
• Expression data
• Protein data
• Metabolomics data
• Genetic variation data
25. Repositories are technology driven
• Expression data: ArrayExpress, GEO
• Protein data: PRIDE
• Metabolomics data: MetaboLight
• Genetic variation data: dbSNP
33. Teams answering real questions
• Finds needs and solutions
• Combines across communities
• Fun! And inspiring
• Interesting, publishable results
34. Starting a database is easy
• What about sustainability:
• Core resources need:
– Long time funding
– Regular monitoring
• Integration in communities