RDA Scholarly Infrastructure 2015

Digital Infrastructure for Reproducibility
in Science
RDA – Paris 2015
@mrgunn
https://orcid.org/0000-0002-3555-2054

Old Model: Single type of content;
single mode of distribution
Scholar
Library
Scholar
Publisher
FORCE11.org: Future of research communications and e-scholarship
Credit: Maryann Martone, hypothes.is

Scholar
Consumer
Libraries
Data Repositories
Code Repositories
Community
databases/platforms
OA
Curators
Social
Networks
Social
NetworksSocial
Networks
Peer Reviewers
Narrative
Workflows
Data
Models
Multimedia
Nanopublications
Code
“Hidden web”
FORCE11.org: Future of research
communications and e-scholarship
Credit: Maryann Martone, hypothes.is

Make your research data citable
 DOIs for your data
 Force11 compliant citations
 Unique DOI for each version
 Integrated into article
submission
 data linked to article
 See metrics of reuse

Built for collaboration
Private
Full control over who
can see and download
Add and remove
collaborators
Local hosting option
ELN integration

Built for mobile
Mendeley Data has been
created to work on a wide
range of devices - from your
lab workstation to your mobile
phone. Browse datasets on the
go, or take a picture in the lab
and upload it instantly to your
research data.

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Studies looking at the prevalence of irreproducibility estimate a rate of 50% or more
Vasilevsky et alBayer Healthcare
(Prinz et al.)
Amgen
(Begley and Ellis)
Glaziou et al.
51%
(n=80)
Hartshorne
and Schachner
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
51%
(n=257)
54%
(n=238)
78%
(n=67)
89%
(n=53)
64%
Source: GBSI 2015

Economic impact of irreproducibility
• A 50% irreproducibility rate within preclinical research implies that more
$28B/year is spent on research that is irreproducible
Reproducible
Irreproducible
Preclinical Research
$56.4B
$28.2B
(50%)
$28.2B
(50%)
US Preclinical Spend
2014E ($B)
$56B
(49%)
$58B
(51%)
Clinical
& Other
US Life Science Spend ($B)
Preclinical
R&D
$114B
Total US Life Science
Spend 2014E ($B)
Source: GBSI 2015

Retraction?
Only 0.2% of the literature (vs 50%+ irreproducibility)
Negative findings?
Less than 30% of researchers who could not reproduce published findings
published their failure1
Only 14% of the literature reports any negative results2
Additional publications from other academic
labs?
“We didn’t see that a target is more likely to be validated if it was reported in ten
publications or in two publications”3
Example: Retraction of PLOS4 and Science5 papers by Pamela Ronald at UC
Davis
• Self retraction due to reagent error
• Results had been ‘confirmed’ independently by three other groups6-8
1. Mobley et al. PLOS ONE. 8, e63221 (2013)
2. Fanelli. Scientometrics. 90, 891 (2012)
3. Prinz et al. Nat Rev Drug Discov. 10, 712 (2011)
4. Han et al. PLOS ONE. 6, e29192 (2011)
5. Lee et al. Science. 326, 850 (2009)
6. McCarthy et al. J Bacteriology. 193, 6375 (2011)
7. Shuguo et al. Appl Biochem Biotechnol. 166, 1368 (2012)
8. Qian et al. J Proteome Res. 12, 3327 (2013)

What about citations?
None of the replication studies reported
have found any correlation with citations (or
journal impact factor):
• NINDS - No significant difference1
• Bayer - No significant difference2
• Amgen - “We saw no significant difference in citation rates
between papers that were reproducible versus non-
reproducible”3
1. Stuart et al. Experimental Neurology 233, 597–605 (2012).
2. Prinz et al. Nat Rev Drug Discov. 10, 712 (2011).
3. Begley and Ellis. Nature. 483, 531-3 (2012)

Reproducibility Project: Cancer Biology

Reproducibility Project: Cancer Biology
Independently replicating key experimental results from
the top 50 cancer biology studies from 2010-2012
Learn more at http://cos.io/cancerbiology

Validation Projects: lessons to date

Validation Projects: lessons to date
Project 1. Cancer Biology Reproducibility Project
Resource identification:

Suggestions for improvements
Reduce reliance on contacting original authors who may
have moved on or forgotten details by the time of deposit
• Build tools to help capture research workflow and data
• (semi)-automated e.g. Riffyn, GenomeStudio
• Update journal formats to capture more information and
data e.g. GigaScience, Nature
• Uniquely identify commercial reagents e.g. Resource
Identification Initiative
• Repositories for materials e.g. Addgene, JAX, ATCC

How and why we fail
at infrastructure

Definition – that which you only
notice when it breaks
We’re terrible at it, because it’s very
bureaucratic and boring

Funders want to talk about
infrastructure (meetings like
this!), but researchers don't.

How we do infrastructure now
1. A funder offers money to build infrastructure.
2. Researchers propose to do what they’re already doing
but generate infrastructure as a “side effect”.
3. A project gets funded using words such as “open,
distributed, lightweight, framework, etc”.
4. The distributed system gets co-opted by a centralized
one (or it fades away)
Examples: Handles & URIs-> DOIs, profiles -> ORCID

Bilder/Lin/Neylon org principles
• Trust can’t be solved by technology.
• A stakeholder-governed organization needs to manage
infrastructure.
• Transparency and openness is necessary but not
sufficient
• Org should aim for sustainability, but must not lobby and
needs formal incentives to replace itself
• Revenue should be based on services, not data
– builds trust, as community can go elsewhere if needed

www.mendeley.com
william.gunn@mendeley.com
@mrgunn

RDA Scholarly Infrastructure 2015

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à RDA Scholarly Infrastructure 2015

Similaire à RDA Scholarly Infrastructure 2015 (20)

Dernier

Dernier (20)

RDA Scholarly Infrastructure 2015