Role of crowdsourcing

Data and text mining workshop
The role of crowdsourcing
Anna Noel-Storr
Wellcome Trust, London, Friday 6th March 2015

What is crowdsourcing?
“…the practice of obtaining needed services, ideas, or content by soliciting contributions
from a large group of people, and especially from an online community, rather than from
traditional employees…”
Image credit: DesignCareer

Knowledge
discovery
and
management
Brabham’s problem focused crowdsourcing typology: 4 types

Knowledge
discovery
and
management
Broadcast
search

Knowledge
discovery
and
management
Broadcast
search
Peer-vetted
creative
production

Knowledge
discovery
and
management
Broadcast
search
Peer-vetted
creative
production
Distributed
human
intelligence
tasking

Micro-tasking: process
Breaking down large corpus of data into smaller units
and distributing those units to a large online crowd
“the distribution of small parts of a problem”

Human computation
Humans remain better than machines at certain tasks:
e.g. Identifying pizza toppings from a picture of a pizza
e.g. “preventing obesity without eating like a rabbit”.ti. – autotag: Animal study

Tools and platforms
What platforms and tools exist and how do they work?
Image credit: ThinkStock

The Zooniverse
“each project uses the efforts and ability of volunteers to help
scientists and researchers deal with the flood of data that confronts them”

Classification and annotation
Galaxy Zoo
Operation War Diary

Health related evidence production
Can we use crowdsourcing to identify the
evidence in a more timely way?
- Known pressure point within the review production
- Between 2000 and 5000 citations per new review, but can be much more
- A not much loved task
Trial
identification

The Embase project
Cochrane’s
Central Register
of Controlled
Trials:
CENTRAL
Embase
Crowd
Embase
auto
Step 2: Use a crowd to screen thousands of search results from Embase and feed
the identified reports of RCTs into CENTRAL
Howwill the crowd do this?
Step 1: run a very sensitive search in the largest biomedical database for studies

The screening tool
Three
choice
s
You are not alone!
(and you can’t
go back)
Progress bar
Yellow highlights to
indicate a likely RCT
Red highlights

The Embase project: recruitment
- 900+ people have signed-up to screen citations in 12 months
- 110,000+ citations have been collectively screened
- 4,000 RCTs/q-RCTs identified by the crowd
0
100
200
300
400
500
600
700
800
900
1000
Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15
Number of Participants
Participants

Why do people do it?
Made it very easy to participate
(and equally easy to stop!)
Gain experience
(bulk up the CV)
Provide feedback: both
to the individual and to
the community
Wanting to do something to contribute
(healthcare is a strong hook)
(people are more
likely to come back)

RCT RCT RCT
Reject Reject Reject
Unsure
CENTRAL
Bin
Resolver
How accurate is the crowd?
RCTReject Resolver
5%

Crowd accuracy
TP
1565
FP
9
FN
2
TN
2888
TP
415
FP
5
FN
1
TN
2649
The Crowd:
INDEX
TEST
The Crowd:
INDEX
TEST
The Info specialist:
REFERENCE STANDARD
The Info specialists:
REFERENCE STANDARD
Validation 1
Validation 2
Sensitivity: 99.9% Specificity: 99.7% Sensitivity: 99.8% Specificity: 99.8%
Enriched sample; blinded to crowd
decision; dual independent screeners as
reference standard
Enriched sample; blinded to crowd
decision; single independent expert
screener (me!) as reference standard;
possibility of incorporation bias
Individual screener accuracy is also carefully monitored

How fast is the crowd?
Number of
weeks
Jan 2014 Jul 2014 Jan 2015
6 weeks
5 weeks
2 weeks
More screeners and more screeners screening more quickly
Length of time to screen
one month’s worth of records

More of the same, and more tasks
As the crowd becomes more efficient, we plan to do two things:
1. Increase the databases we search – feed in more citations
2. Offer other ‘micro-tasks’
Feed in more
citations – from
other databases
Bin
Y
N
Screen
Annotate,
appraise
And in these tasks the machine plays a
vital and complementary role…
e.g. is the healthcare
condition Alzheimer’s
disease? Y, N, Unsure

Perfect partnership
Machine driven probability + Collective human decision-making
It’s not one or the other, the ideal is both

In summary
• Effective method in large scale
study identification
• Identify more studies, more
quickly
• No compromise on quality or
accuracy
• Offers meaningful ways to
contribute
• Feasible to recruit a crowd
• Highly functional tool
• Complements data and text
mining
And enables the move towards the living review
Crowdsourcing:

Role of crowdsourcing

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (12)

En vedette

En vedette (20)

Similaire à Role of crowdsourcing

Similaire à Role of crowdsourcing (20)

Plus de Graham Steel

Plus de Graham Steel (18)

Dernier

Dernier (20)

Role of crowdsourcing

Notes de l'éditeur