7. “Scientists like to think of science as
self-correcting. To an alarming degree, it is not.”
8. A crisis of replicability and credibility?
Pre-clinical oncology – 89% not reproducible
Reasons
• Fraudulent behaviour
• Invalid reasoning
• Absent or inadequate data and/or metadata
Causes?
• Pressure to publish
• Pressure to make excessive claims
• Data hoarding
• Poor data science
Solutions: Open data & Valid Methods
9. What open data means
For effective communication, replication and re-purposing
we need intelligent openness. Data, meta-data and,
increasingly software/machine codes must be:
• Discoverable
• Accessible
• Intelligible
• Assessable
• Re-usable
Only with these criteria are are data properly open.
& not only the data & meta-data
but also the software used to
manipulate it
The data providing the evidence for a published concept
MUST be concurrently published.
To do otherwise should come to be regarded by all, including
journals, as scientific MALPRACTICE.
10. Linear regression
Cluster analysis
Dynamic/complex behaviour
Complex systems
No mathematical pipeline
Glucose levels in Type II Diabetes
Simple relationships
Classical statistics
Topological analysis
Valid reasoning: e.g. coping with complex data
14. From “simple” science to complexity,
from uncoupled to highly coupled systems
Uncoupled
systems
The behaviour of
highly coupled systems
15. Simulating a complex system Characterising a complex system
Image of brain cells in a ratEmergent behaviour of a specific
6-component coupled system
Complex systems
16. Semantic Web of Broad, Linked Data
Subject – Predicate – Object - e.g. A Rice Ontology
17. 1. Maintaining “self-correction”
2. Open knowledge is creative & productive
“If you have an apple and I have an apple and we
exchange these apples, then you and I will still
each have one apple. But if you have an idea and I
have an idea and we exchange these ideas, then
each of us will have two ideas.”
3. Open data enables semantic linking
George Bernard Shaw
Why openness & sharing?
18. Mathematics related discussions
Tim Gowers
- crowd-sourced mathematics
An unsolved problem posed on
his blog.
32 days – 27 people – 800
substantive contributions
Emerging contributions rapidly
developed or discarded
Problem solved!
“Its like driving a car whilst
normal research is like pushing
it”
What inhibits such processes?
- The criteria for credit and
promotion
– ALTMETRICS THE ANSWER?
New modes of technology-
enabled creativity:
e.g Crowd-sourcing
19. • Openly collected science is already helping policy
makers.
• AshTag app allows users to submit photos and
locations of sightings to a team who will refer them on
to the Forestry Commission, which is leading efforts to
stop the disease's spread with the Department for
Environment, Food and Rural Affairs (Defra).
Chalara spread: 1992-2012
Citizen Science
21. EMBL-EBI services
Labs around the
world send us
their data and
we…
Archive it
Classify it
Share it with
other data
providers
Analyse, add
value and
integrate it
…provide
tools to help
researchers
use it
A collaborative
enterprise
… seizing the data opportunities depends
on an ethos of data-sharing
e.g. a growing number of disciplinary communities
22. Ins tu onal
management and support
Na onal policies
& e-infrastructure
Open
Research
Data
Big Data
Analy cs
Knowledge
Output
EXPLOITING THE DATA REVOLUTION
Scien fic inference
Ins tu onal
management & support
Na onal policies
& e-infrastructure
A national data-intensive system
23. CODATACODATA
II
SS
UU
International Research Data Collaboration
CODATACODATA
II
SS
UU
CODATA
Policies & practice
Frontiers of data
science
Capacity Building
WDS
• Data stewardship
• Data standards
RDA
• Interoperability
24. Science International:
international voice of policy for science
International Council for Science (ICSU), International Social Science Council (ISSC),
The World Academy of Science (TWAS), Inter-Academy Partnership (IAP)
Agenda:
• An International Accord: Open Data in a Big Data World (principles)
• African Data Science Capacity Mobilisation Initiative
First, action-oriented meeting
South Africa, December 2015
25. Responsibilities
• Scientists
• Research Institutions and universities
• Publishers
• Funding agencies
• Professional associations, scholarly societies & academies
• Libraries, archives & repositories
Boundaries of openness
Enabling practices
• Citation & provenance
• Interoperability
• Non-restrictive reuse
• Linkability
Science International Principles for Open Data
26. i. Publicly funded scientists have a responsibility to contribute
to the public good through the creation and communication
of new knowledge, of which associated data are intrinsic
parts. They should make such data openly available to
others as soon as possible after their production in ways
that permit them to be re-used and re-purposed.
ii. The data that provide evidence for published scientific
claims should be made concurrently and publicly available
in an intelligently open form in a way that permits the logic
of the link between data and claim to be rigorously
scrutinised and the validity of the data to be tested by
replication of experiments or observations. To the extent
possible, data should be deposited in managed repositories
with low access barriers.
Principles for Open Data
Responsibilities of scientists
27. Possible Regional Platforms for Open Science
African
Platform
Asian
Platform?
?
Australian
Platform
Shared investment in infrastructure; harvesting and circulating good ideas;
spreading and supporting good practice; capacity building; promoting
applications; linking to international programmes and standards.
?
28. A taxonomy of open science (research)
- a journey towards science as a public enterprise
Inputs Outputs & open engagement with
Open access
Administrative
data (held by
public
authorities e.g.
prescription
data)
Public Sector
Research data
(e.g. Met
Office weather
data)
Research
Data (e.g.
CERN,
generated in
universities)
Research
publications
(i.e. papers in
journals)
Open data
Open science
Collecting the
data
Doing
research
Doing science
openly
Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists
Co-production of knowledge
Information & knowledge as public not private goods
34. A barrier to openness? - Analytic overload.
E.g. - Global Earth Observation System of Systems
• What is the human role?
• Can we analyse & scrutinise what is in the
black box? - &who owns the box?
• What does it mean to be a researcher in a
data intensive age?
A disconnect between machine
analysis & human cognition?
35. A barrier to openness? - Analytic overload.
E.g. - Global Earth Observation System of Systems
• What is the human role?
• Can we analyse & scrutinise what is in the
black box? - &who owns the box?
• What does it mean to be a researcher in a
data intensive age?
A disconnect between machine
analysis & human cognition?
Notes de l'éditeur
The material advance of human society has been based on the acquisition and use of knowledge and science, as it has been practised in the last 300 years has proved to be the most effective way of gaining reliable knowledge. I want to talk about the processes whereby science is done and how they need to adapt to a novel environment in which we are able to acquire, store, manipulate and communicate data of unprecedented volume and complexity. What challenges does this environment offer to the essential processes of science, how can we exploit the opportunities that it offers and what barriers inhibit necessary changes. This is not about openness for itself – but open processes in the doing of science) Open science is not new. It was the bedrock on which the extraordinary scientific revolutions of the 18th and 19th centuries were built. But we do need to reinvent it for a data-rich era. So let us start with a little history.
This is Henry Oldenberg, the first secretary of the newly formed Royal Society in the early 1660s. Henry was an inveterate correspondent, with those we would now call scientists both in Europe and beyond. Rather than keep this correspondence private, he thought it would be a good idea to publish it, and persuaded the new Society to do so by creating the Philosophical Transactions, which remains a top-flight journal to the present day. But he demanded two things of his correspondents: that they should submit in the vernacular and not Latin; and that evidence (data) that supported a concept must be published together with the concept. It permitted others to scrutinize the logic of the concept, the extent to which it was supported by the data and permitted replication and re-use. Open publication of concept and evidence is the basis of “scientific self-correction”, which historians of science argue were the crucial building blocks on which the scientific revolution of the 18th and 19th centuries was built and remain fundamental to the progress of science. Openness to scrutiny by scientific peers is the most powerful form of peer review.
But Oldenberg’s world has changed. The last 20 years have seen an unprecedented data storm, which poses both challenges and opportunities to the way science is done.
The fundamental challenge is to scientific self-correction. Journals can no longer contain the data, and neither scientists nor journals have taken the obvious step of having data relevant to a publication concurrently available in an electronic database. (example of last year’s Nature paper revealing that only 11% of results in 50 benchmark papers in pre-clinical oncology were replicable. If lack of Oldenburg’s rigour in presenting evidence is widespread, a failure of replicability risks undermines science as a reliable way of acquiring knowledge and can therefore undermines its credibility.
Openness of itself has no value unless it is “intelligent openness”, where data are:
Accessible – they can be found
Intelligible – they can be understood
Assessable – e.g. does the creator have an interest in a particular outcome?
Re-useable – sufficient meta-data to permit re-use and re-purposing.
These should be standard criteria for an open data regime.
But we must recognise that the amount of meta and background data required for intelligent openness to fellow citizens is usually far greater than that required for openness to scientific peers. If all data were to be intelligently open to fellow citizens on the basis that that have ultimately paid for it, science would stop tomorrow. A way forward would be to make a much greater effort to make data intelligently open in what we could call “public interest science”, including those issues that frequently arise in public debate or concern.
Ash dieback, caused by the fungus Chalara fraxinea, was found in the UK in October outside of plantations and nurseries in East Anglia, raising fears of a repeat of Dutch elm disease which killed 25 million mature elms in the 1970s and 80s. In an attempt to map and help prevent the spread of the disease across the country, a team of developers and academics worked through the weekend to create an app that smartphone owners can use to report suspected cases of infection. Infected ash trees are recognisable by lesions on their bark, dieback of leaves at the tree's crown, and leaves turning brown – though experts say the arrival of autumn makes the latter harder to accurately spot. zThe AshTag app for IOS and Android devices allows users to submit photos and locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra).
Human and technical requirements for a sustainable data infrastructure.
Network of world data centres.
Data policies and data science: bringing data experts together with research scientists.
Lots of interchangeable and fluid terms but many shared principles.
The word “science” is used to mean the systematic organisation of knowledge that can be rationally explained and reliably applied. It is not exclusively restricted to “natural science”.
The material advance of human society has been based on the acquisition and use of knowledge and science, as it has been practised in the last 300 years has proved to be the most effective way of gaining reliable knowledge. I want to talk about the processes whereby science is done and how they need to adapt to a novel environment in which we are able to acquire, store, manipulate and communicate data of unprecedented volume and complexity. What challenges does this environment offer to the essential processes of science, how can we exploit the opportunities that it offers and what barriers inhibit necessary changes. This is not about openness for itself – but open processes in the doing of science) Open science is not new. It was the bedrock on which the extraordinary scientific revolutions of the 18th and 19th centuries were built. But we do need to reinvent it for a data-rich era. So let us start with a little history.