SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Marco Brandizi <marco.brandizi@rothamsted.ac.uk>
Oct, 16th - iGEM 2020 webinar
BetterDataforaBetterWorld
Find this presentation on SlidesShare
Background source: https://pxhere.com/en/photo/857152
Hello!
• Geek since 1980s and C=64 times
• Started working with Life Science Data 2003
• at Univ. of Milano-Bicocca, EMBL-EBI
• and now Rothamsted Research
• Meanwhile, (h)activism in open source, open
data
A Long History
Mankind and Data
• Gather knowledge
• Know how things work, make predictions
• Improve our lives
• (in addition to being good on itself)
Egypt, 2500BC (https://brewminate.com/census-taking-in-the-ancient-world/)
In the past 20yrs or so
Economist, 2010
(https://www.economist.com/node/21521548)
Why and How?
In the past 20yrs or so
We advanced in
• Gathering (eg, smartphones, IoT, 5G)
• Stocking (eg, clouds)
• Processing (eg, AI, Machine Learning)
• Sharing (eg, web, standards, data portals)
• Searching (eg, NoSQL, Indexing)
• Visualising (eg, literature on HCI, data
charts)...
...Data, Information, Knowledge
duction Precision Farming TIM AgRA Present Future Conclusion References
recision Farming [1]
13 / 42
Images Source:
http://ieeeagra.com/ieeeagra/Downloads/20141204-Fernandez-Presentation.pdf
and establish virtuous circles
(background: https://www.flickr.com/photos/kevinmgill/14676390490/in/photostream/)
A World of Openness
The Cause for Open Data/Knowledge
• Data portals, policies, standards
• https://www.data.gov/, https://data.gov.uk/
• https://www.europeandataportal.eu/en
• https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information
• https://joinup.ec.europa.eu/
• In science
• https://fairsharing.org/
• https://www.nature.com/sdata/
• Data and activism
• DBPedia, aka Wikipedia as data (https://wiki.dbpedia.org/about)
• Wikidata (https://www.wikidata.org/)
• Open Street Map (https://www.openstreetmap.org/about)
Open Data Cause: The Life Science Use Case
https://evaprofecmc.jimdofree.com/unit-4-the-genetic-revolution/2-2-chromosomes-and-genes/
So, sequencing was (is) pretty much important...
Source: https://boydfuturist.wordpress.com/tag/human-genome-project/
(also an interesting reading)
...indeed
• The race to sequence the human genome
https://www.youtube.com/watch?v=AhsIF-cmoQQ
• The Human Genome Project Race
https://genomics-old.soe.ucsc.edu/research/hgp_race
• How to sequence human genome
https://www.youtube.com/watch?v=MvuYATh7Y74
Recommended:
Fast-forward to nowadays
Which integrates with a wealth of (open) data
And allows for Reuse and further Advancements
The Cause for Open Data
• Allows for reuse
• no need to regenerate
• less expensive
• Allows for integration between heterogeneous data
• different entities (genes, proteins, chemistry,
species, literature...)
• different scales (cells, organs, individuals,
populations)
• New discoveries, novel uses
• Reproducible science
• and quality improvement
Practical Reasons
The Cause for Open Data
• Public-funded data are ours
• Savings opportunities add up
• (but giving them out for free has a cost)
• Data are ours anyway (eg, genetic data)
• Transparency (and again, reproducibility)
• Public benefits outweigh private interests
Ethical Reasons
But, how?
Based on publications, which genes are related to yellow
rust? In which biological processes are their encoded
proteins involved?
1 2
3 4
5
6
1
2
3
4
5
6
Good Data Principles:
Interoperability through Standards
https://tinyurl.com/y5e6kfa2
https://doi.org/10.1186/s41074-019-0055-1
https://tinyurl.com/y3h9c65k
https://tinyurl.com/y2wzlwbk
Data Standards: schema.org example
https://www.bbcgoodfood.com/recipes/classic-potato-salad
Source & recommended read: https://www.slideshare.net/NiallBeard/bioschemas-workshop
schema.org used for Knetminer and Agrifood Data
github.com/Rothamsted/agri-schemas https://tinyurl.com/y44a5lj9
References
• Brandizi et al, 2018, https://europepmc.org/
article/med/30085931
• IB2018 presentation https://tinyurl.com/
yaq8nt5e
• AgriSchemas and data standards, IB 2019
• Reusing Knetminer data with Python/Jupyter
• https://tinyurl.com/yyhnkuyk
• https://tinyurl.com/y446y979
Good Data Principles: FAIR
• Findable
• ex, Give your dataset a DOI, which resolves to schema.org
descriptor, register it on datasetsearch.research.google.com
• Accessible
• ex, resolvable DOI makes it accessible. Wrap with access
control as needed
• Interoperable
• Eg, data described with schema.org, GO and other OBO
ontologies
• Query protocols/standards (eg, SPARQL, GraphQL APIs,
JSON Schema APIs, JSON-LD APIs)
• Reusable
• Clear licence
• Ideally, machine-readable licence (eg, CCREL)
Source and recommended read: https://tinyurl.com/yxocd3b9
Issues: Easier to Say than to Do
https://tinyurl.com/yxsftwvy
https://xkcd.com/927/
Issues: Common Good vs Private Interests
• ...Parts of the standard that are not priorities for Google are not well documented
anywhere. If they are priorities for Google, however, Google itself provides excellent
documentation about how information should be specified in schema.org so that Google
can use it. Because schema.org’s documentation is poor, the focus of attention stays on
Google.
Time to end Google’s domination of schema.org,
https://tinyurl.com/y6j7ke8u
• Not everyone wants data published, eg, failed clinical trials
• Balance needed between research needs and private lives, eg,
• The Immortal Life of Henrietta, Rebecca Skloot
• k-anonymity, mediation approaches
(Brandizi et al, 2017, https://doi.org/10.1186/s12911-017-0424-6)
Issues: Data are Power
http://www.tylervigen.com/spurious-correlations
Issues: Data are Power
• My son was a typically developing toddler. ... He received his first MMR at 19 months of
age. The change in him was almost immediate. He did not regress in development, but
his social skills became extremely compromised. Noises became unbearable...
MMR vaccine caused my son's autism, https://tinyurl.com/y2udlfcb
It's sad, but it's a spurious correlation, vaccines do not cause autism
Issues: are We in Control?
https://www.nature.com/articles/d41586-020-01874-9
https://tinyurl.com/yxay8w2j
https://www.bbc.com/news/business-42959755
https://tinyurl.com/ydykjugt
https://tinyurl.com/hu3lh32
And Which Control?
https://tinyurl.com/y2yjrkpa
https://tinyurl.com/y82zf8qu
https://www.youtube.com/watch?v=ciBLsJkQ1WY
So...
• Future is even more digital
• And even more data-intensive
• Everyone should at least have an idea
• Especially if you want to become a scientist
• About producing data (eg, FAIR, formats,
standards)
• And consuming data (eg, data resources, Graph
DB query languages)
• And more (eg, Python, Pandas, Graph DBs,
APIs)https://tinyurl.com/y5rdq7qx
So...
• Probably we need better management and (a
bit of, international) regulation
• of technical aspects (eg, PA standards,
research data publishing)
• of ethical aspects (eg, open access,
algorithms, censorships)
• But also more grassroots participation
• we are all responsible, especially as scientists
• Data science is cool!
https://tinyurl.com/y5rdq7qx
Acknowledgements
Ajit Singh

Software Engineer
• Joseph Hearnshaw, software engineer
• Samiul Haque, Ed Eyles, IT admins
• Alice Minotto, Earlham Inst, hosting providers
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, Data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors,
developers
Keywan Hassani-Pak

Knetminer Team Leader
Chris Rawlings

Head of Computational & Analytical Sciences
Jeremy Parsons

Bioinformatics Scientist
Acknowledgements
Ajit Singh

Software Engineer
• Joseph Hearnshaw, software engineer
• Samiul Haque, Ed Eyles, IT admins
• Alice Minotto, Earlham Inst, hosting providers
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, Data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors,
developers
Keywan Hassani-Pak

KnetMiner Team Leader
Chris Rawlings

Head of Computational & Analytical Sciences
Jeremy Parsons

Bioinformatics Scientist
AndYou!
Extras
The Cause for Open Data/Knowledge
• Open data is the idea that some data should be freely available to everyone to use and
republish as they wish, without restrictions from copyright, patents or other mechanisms of
control (https://en.wikipedia.org/wiki/Open_data)
• Popularised by Obama in 2009 [1], Hans Rosling [3], Tim Berners Lee [2] (recommended
readings/watches)
• [1] https://www.govtech.com/data/What-Obama-Did-for-Tech-Transparency-and-Open-Data.html
• [2] https://www.ted.com/talks/tim_berners_lee_the_next_web?language=en
• [3] https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen
IBM Watson
• Not the first time that AI passed the
Turing test (eg, Deep Blue and Chess,
1996)
• But big milestone (in 2011) about
knowledge management
• Specialisations possible, e.g., IBM
Watson Health
Mini documentary at
https://www.youtube.com/watch?v=P18EdAKuC1U
Surprising Data Insights
• Couples who argue often are more likely to
last long (90% accuracy)
• If you want such a life...
• Many other examples of surprising data:
9 Bizarre and Surprising Insights from
Data Science (https://tinyurl.com/yywgr2rv)
https://www.businessinsider.com/mathematical-secret-to-lasting-relationships-2015-6
Issues: Data are Power
Source and recommended read:
https://theconversation.com/five-maps-that-will-change-how-you-see-the-world-74967

Contenu connexe

Tendances

Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literaturepetermurrayrust
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planC. Tobin Magle
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literaturepetermurrayrust
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataPhilip Bourne
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 

Tendances (20)

The expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry communityThe expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry community
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 

Similaire à Better Data for a Better World

Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
GODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific InstitutionsGODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific InstitutionsJohannes Keizer
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
Introduction to Open Science and EOSC
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSCSarah Jones
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...Johann van Wyk
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
Rising tide of data update 20171024
Rising tide of data update 20171024Rising tide of data update 20171024
Rising tide of data update 20171024Keith Russell
 
Rising tide of data update
Rising tide of data update Rising tide of data update
Rising tide of data update ARDC
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of DataDavid De Roure
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data scienceFabio Stella
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 

Similaire à Better Data for a Better World (20)

Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
GODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific InstitutionsGODAN presentation with South Chinese Scientific Institutions
GODAN presentation with South Chinese Scientific Institutions
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
Introduction to Open Science and EOSC
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSC
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Rising tide of data update 20171024
Rising tide of data update 20171024Rising tide of data update 20171024
Rising tide of data update 20171024
 
Rising tide of data update
Rising tide of data update Rising tide of data update
Rising tide of data update
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of Data
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data science
 
2016 08 gxaas
2016 08 gxaas2016 08 gxaas
2016 08 gxaas
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 

Plus de Rothamsted Research, UK

Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasRothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food DomainRothamsted Research, UK
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerRothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Rothamsted Research, UK
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference serviceRothamsted Research, UK
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialRothamsted Research, UK
 

Plus de Rothamsted Research, UK (20)

Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Semic 2013
Semic 2013Semic 2013
Semic 2013
 

Dernier

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Dernier (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Better Data for a Better World

  • 1. Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Oct, 16th - iGEM 2020 webinar BetterDataforaBetterWorld Find this presentation on SlidesShare Background source: https://pxhere.com/en/photo/857152
  • 2. Hello! • Geek since 1980s and C=64 times • Started working with Life Science Data 2003 • at Univ. of Milano-Bicocca, EMBL-EBI • and now Rothamsted Research • Meanwhile, (h)activism in open source, open data
  • 3. A Long History Mankind and Data • Gather knowledge • Know how things work, make predictions • Improve our lives • (in addition to being good on itself) Egypt, 2500BC (https://brewminate.com/census-taking-in-the-ancient-world/)
  • 4. In the past 20yrs or so Economist, 2010 (https://www.economist.com/node/21521548)
  • 5. Why and How? In the past 20yrs or so
  • 6. We advanced in • Gathering (eg, smartphones, IoT, 5G) • Stocking (eg, clouds) • Processing (eg, AI, Machine Learning) • Sharing (eg, web, standards, data portals) • Searching (eg, NoSQL, Indexing) • Visualising (eg, literature on HCI, data charts)... ...Data, Information, Knowledge duction Precision Farming TIM AgRA Present Future Conclusion References recision Farming [1] 13 / 42 Images Source: http://ieeeagra.com/ieeeagra/Downloads/20141204-Fernandez-Presentation.pdf and establish virtuous circles
  • 8. The Cause for Open Data/Knowledge • Data portals, policies, standards • https://www.data.gov/, https://data.gov.uk/ • https://www.europeandataportal.eu/en • https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information • https://joinup.ec.europa.eu/ • In science • https://fairsharing.org/ • https://www.nature.com/sdata/ • Data and activism • DBPedia, aka Wikipedia as data (https://wiki.dbpedia.org/about) • Wikidata (https://www.wikidata.org/) • Open Street Map (https://www.openstreetmap.org/about)
  • 9. Open Data Cause: The Life Science Use Case https://evaprofecmc.jimdofree.com/unit-4-the-genetic-revolution/2-2-chromosomes-and-genes/
  • 10. So, sequencing was (is) pretty much important... Source: https://boydfuturist.wordpress.com/tag/human-genome-project/ (also an interesting reading)
  • 11. ...indeed • The race to sequence the human genome https://www.youtube.com/watch?v=AhsIF-cmoQQ • The Human Genome Project Race https://genomics-old.soe.ucsc.edu/research/hgp_race • How to sequence human genome https://www.youtube.com/watch?v=MvuYATh7Y74 Recommended:
  • 13. Which integrates with a wealth of (open) data
  • 14. And allows for Reuse and further Advancements
  • 15. The Cause for Open Data • Allows for reuse • no need to regenerate • less expensive • Allows for integration between heterogeneous data • different entities (genes, proteins, chemistry, species, literature...) • different scales (cells, organs, individuals, populations) • New discoveries, novel uses • Reproducible science • and quality improvement Practical Reasons
  • 16. The Cause for Open Data • Public-funded data are ours • Savings opportunities add up • (but giving them out for free has a cost) • Data are ours anyway (eg, genetic data) • Transparency (and again, reproducibility) • Public benefits outweigh private interests Ethical Reasons
  • 17. But, how? Based on publications, which genes are related to yellow rust? In which biological processes are their encoded proteins involved? 1 2 3 4 5 6 1 2 3 4 5 6
  • 18. Good Data Principles: Interoperability through Standards https://tinyurl.com/y5e6kfa2 https://doi.org/10.1186/s41074-019-0055-1 https://tinyurl.com/y3h9c65k https://tinyurl.com/y2wzlwbk
  • 19. Data Standards: schema.org example https://www.bbcgoodfood.com/recipes/classic-potato-salad Source & recommended read: https://www.slideshare.net/NiallBeard/bioschemas-workshop
  • 20. schema.org used for Knetminer and Agrifood Data github.com/Rothamsted/agri-schemas https://tinyurl.com/y44a5lj9
  • 21. References • Brandizi et al, 2018, https://europepmc.org/ article/med/30085931 • IB2018 presentation https://tinyurl.com/ yaq8nt5e • AgriSchemas and data standards, IB 2019 • Reusing Knetminer data with Python/Jupyter • https://tinyurl.com/yyhnkuyk • https://tinyurl.com/y446y979
  • 22. Good Data Principles: FAIR • Findable • ex, Give your dataset a DOI, which resolves to schema.org descriptor, register it on datasetsearch.research.google.com • Accessible • ex, resolvable DOI makes it accessible. Wrap with access control as needed • Interoperable • Eg, data described with schema.org, GO and other OBO ontologies • Query protocols/standards (eg, SPARQL, GraphQL APIs, JSON Schema APIs, JSON-LD APIs) • Reusable • Clear licence • Ideally, machine-readable licence (eg, CCREL) Source and recommended read: https://tinyurl.com/yxocd3b9
  • 23. Issues: Easier to Say than to Do https://tinyurl.com/yxsftwvy https://xkcd.com/927/
  • 24. Issues: Common Good vs Private Interests • ...Parts of the standard that are not priorities for Google are not well documented anywhere. If they are priorities for Google, however, Google itself provides excellent documentation about how information should be specified in schema.org so that Google can use it. Because schema.org’s documentation is poor, the focus of attention stays on Google. Time to end Google’s domination of schema.org, https://tinyurl.com/y6j7ke8u • Not everyone wants data published, eg, failed clinical trials • Balance needed between research needs and private lives, eg, • The Immortal Life of Henrietta, Rebecca Skloot • k-anonymity, mediation approaches (Brandizi et al, 2017, https://doi.org/10.1186/s12911-017-0424-6)
  • 25. Issues: Data are Power http://www.tylervigen.com/spurious-correlations
  • 26. Issues: Data are Power • My son was a typically developing toddler. ... He received his first MMR at 19 months of age. The change in him was almost immediate. He did not regress in development, but his social skills became extremely compromised. Noises became unbearable... MMR vaccine caused my son's autism, https://tinyurl.com/y2udlfcb It's sad, but it's a spurious correlation, vaccines do not cause autism
  • 27. Issues: are We in Control? https://www.nature.com/articles/d41586-020-01874-9 https://tinyurl.com/yxay8w2j https://www.bbc.com/news/business-42959755 https://tinyurl.com/ydykjugt https://tinyurl.com/hu3lh32
  • 29. So... • Future is even more digital • And even more data-intensive • Everyone should at least have an idea • Especially if you want to become a scientist • About producing data (eg, FAIR, formats, standards) • And consuming data (eg, data resources, Graph DB query languages) • And more (eg, Python, Pandas, Graph DBs, APIs)https://tinyurl.com/y5rdq7qx
  • 30. So... • Probably we need better management and (a bit of, international) regulation • of technical aspects (eg, PA standards, research data publishing) • of ethical aspects (eg, open access, algorithms, censorships) • But also more grassroots participation • we are all responsible, especially as scientists • Data science is cool! https://tinyurl.com/y5rdq7qx
  • 31. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 Knetminer Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist
  • 32. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 KnetMiner Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist AndYou!
  • 34. The Cause for Open Data/Knowledge • Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control (https://en.wikipedia.org/wiki/Open_data) • Popularised by Obama in 2009 [1], Hans Rosling [3], Tim Berners Lee [2] (recommended readings/watches) • [1] https://www.govtech.com/data/What-Obama-Did-for-Tech-Transparency-and-Open-Data.html • [2] https://www.ted.com/talks/tim_berners_lee_the_next_web?language=en • [3] https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen
  • 35. IBM Watson • Not the first time that AI passed the Turing test (eg, Deep Blue and Chess, 1996) • But big milestone (in 2011) about knowledge management • Specialisations possible, e.g., IBM Watson Health Mini documentary at https://www.youtube.com/watch?v=P18EdAKuC1U
  • 36. Surprising Data Insights • Couples who argue often are more likely to last long (90% accuracy) • If you want such a life... • Many other examples of surprising data: 9 Bizarre and Surprising Insights from Data Science (https://tinyurl.com/yywgr2rv) https://www.businessinsider.com/mathematical-secret-to-lasting-relationships-2015-6
  • 37. Issues: Data are Power Source and recommended read: https://theconversation.com/five-maps-that-will-change-how-you-see-the-world-74967