User Guide: Orion™ Weather Station (Columbia Weather Systems)
Role of crowdsourcing
1. Data and text mining workshop
The role of crowdsourcing
Anna Noel-Storr
Wellcome Trust, London, Friday 6th March 2015
2. What is crowdsourcing?
“…the practice of obtaining needed services, ideas, or content by soliciting contributions
from a large group of people, and especially from an online community, rather than from
traditional employees…”
Image credit: DesignCareer
8. Micro-tasking: process
Breaking down large corpus of data into smaller units
and distributing those units to a large online crowd
“the distribution of small parts of a problem”
9. Human computation
Humans remain better than machines at certain tasks:
e.g. Identifying pizza toppings from a picture of a pizza
e.g. “preventing obesity without eating like a rabbit”.ti. – autotag: Animal study
10. Tools and platforms
What platforms and tools exist and how do they work?
Image credit: ThinkStock
11. The Zooniverse
“each project uses the efforts and ability of volunteers to help
scientists and researchers deal with the flood of data that confronts them”
13. Health related evidence production
Can we use crowdsourcing to identify the
evidence in a more timely way?
- Known pressure point within the review production
- Between 2000 and 5000 citations per new review, but can be much more
- A not much loved task
Trial
identification
14. The Embase project
Cochrane’s
Central Register
of Controlled
Trials:
CENTRAL
Embase
Crowd
Embase
auto
Step 2: Use a crowd to screen thousands of search results from Embase and feed
the identified reports of RCTs into CENTRAL
Howwill the crowd do this?
Step 1: run a very sensitive search in the largest biomedical database for studies
16. The Embase project: recruitment
- 900+ people have signed-up to screen citations in 12 months
- 110,000+ citations have been collectively screened
- 4,000 RCTs/q-RCTs identified by the crowd
0
100
200
300
400
500
600
700
800
900
1000
Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15
Number of Participants
Participants
17. Why do people do it?
Made it very easy to participate
(and equally easy to stop!)
Gain experience
(bulk up the CV)
Provide feedback: both
to the individual and to
the community
Wanting to do something to contribute
(healthcare is a strong hook)
(people are more
likely to come back)
18. RCT RCT RCT
Reject Reject Reject
Unsure
CENTRAL
Bin
Resolver
How accurate is the crowd?
RCTReject Resolver
5%
19. Crowd accuracy
TP
1565
FP
9
FN
2
TN
2888
TP
415
FP
5
FN
1
TN
2649
The Crowd:
INDEX
TEST
The Crowd:
INDEX
TEST
The Info specialist:
REFERENCE STANDARD
The Info specialists:
REFERENCE STANDARD
Validation 1
Validation 2
Sensitivity: 99.9% Specificity: 99.7% Sensitivity: 99.8% Specificity: 99.8%
Enriched sample; blinded to crowd
decision; dual independent screeners as
reference standard
Enriched sample; blinded to crowd
decision; single independent expert
screener (me!) as reference standard;
possibility of incorporation bias
Individual screener accuracy is also carefully monitored
20. How fast is the crowd?
Number of
weeks
Jan 2014 Jul 2014 Jan 2015
6 weeks
5 weeks
2 weeks
More screeners and more screeners screening more quickly
Length of time to screen
one month’s worth of records
21. More of the same, and more tasks
As the crowd becomes more efficient, we plan to do two things:
1. Increase the databases we search – feed in more citations
2. Offer other ‘micro-tasks’
Feed in more
citations – from
other databases
Bin
Y
N
Screen
Annotate,
appraise
And in these tasks the machine plays a
vital and complementary role…
e.g. is the healthcare
condition Alzheimer’s
disease? Y, N, Unsure
23. In summary
• Effective method in large scale
study identification
• Identify more studies, more
quickly
• No compromise on quality or
accuracy
• Offers meaningful ways to
contribute
• Feasible to recruit a crowd
• Highly functional tool
• Complements data and text
mining
And enables the move towards the living review
Crowdsourcing:
Notes de l'éditeur
I’m going to talk about the role that crowdsourcing can play in the evidence synthesis process and importantly in the move towards the ‘living review’.
But first: What is crowdsourcing? Broadly speaking it’s: “…”
There are different types/and therefore different approaches and tools needed, depending on what it is you need or want from the crowd. Brabham’s problem focused crowdsourcing typology is made up of four types: 1. Knowledge discovery and management – where you get your crowd tasked with finding and collecting information into a common location and format.
2. Broadcast search: where the organisation tasks the crowd with solving an empirical problem
3. Peer vetted creative production where the organisation tasks a crowd with creating and selecting creative ideas, and…
4. Distributed human-intelligence tasking which is where the organisation tasks a crowd with analysing large amounts of information
And it’s this forth type I primarily want to focus on today.
It’s about taking a large corpus of data and breaking it down into much smaller units which are then distributed via the internet to a community of willing volunteers to process. The distribution of small parts of a problem
Some call this kind of work: human computation or human intelligence tasking, because these are tasks where human intelligence is still needed, and where humans still out perform the machine. Such as identifying pizza toppings from an image of a pizza, or, of more relevance to us perhaps: recognizing very quickly that an article is not actually about rabbits just because it has rabbit in the title…
So what tools and platforms exist and what data does the crowd help to process?
The Zooniverse, maintained and developed by the Citizen Science Alliance – largest, most successful citizen science platform. It began with one project, Galaxy Zoo, over eight years ago. The platform now hosts over 30 projects and has a world wide community of almost 1.3 million people. In their words, each project uses the “efforts and ability of volunteers to help scientists and researchers deal with the flood of data that confronts them”. Their focus began on all things space related but they have branched out into the humanities and into aspects of healthcare research.
Here are two examples from two different projects hosted on the Zooniverse. The first, Galaxy Zoo, shows the volunteer an image of a galaxy and then asks a series of questions about that image, such as is it in a spiral shape? The second, Operation War Diary, gets volunteers to tag pages of war diaries.
In our field, that of evidence appraisal, production and dissemination, we face similar challenges in keeping up with the amount of data produced. And within Cochrane we have been exploring the role of crowdsourcing in helping us to process the flood of data.
Our efforts so far have largely focused on one well known pressure point within the review production process: that of trial identification. Our traditional model is under increasing strain as research is exponentially produced. The identified ‘micro-task’ within this broader task of trial identification, is citation screening. It is estimated that the average new systematic review identifies between 2000 and 5000 citations, but this can be much higher for reviews in certain domains or considering certain types of intervention.
What if we could find a reliable and fast way to feed all reports of randomised trials into one central repository thereby removing the need for individual, and often small and under resourced review teams to create and run complex searches across multiple databases and then spend months screening those results for that one single review?
So I’m part of team managing a project rather uninspiringly called: the Embase project. Our aim is use the crowd to help us keep up with the deluge of publications. We run one very sensitive search in Embase (the largest biomedical database in the world) for trials. This search identifies thousands of citations, as you would expect. Some of the results of this search we feed directly into Cochrane central database of controlled trials. What’s left, needs human intervention. It’s these records that we send out to the crowd to classify.
We do this using a citation screening tool. This tool is fundamental to the crowd’s ability to perform the task. We wanted to develop something that focused almost entirely on the task in hand – that of screening a citation – as you can see the screen is mostly taken up with the citation which is stripped down to just title and abstract. There are some built in pre-defined highlighted words and phrases which are to help guide screeners to the most relevant parts of a citation. Yellow highlighted words and phrases indicate that the record is likely to be describing an RCT and Red highlights indicate that the record is likely to be a Reject. Screeners can also add their own highlights. There are three decision buttons: RCT/CCT, Reject or Unsure and screeners have to make a decision on a record; two other features I just want to quickly point out are the all important progress bar, and the feature which tells you how many others are online screening citations at the same time…
We have a task, we have a tool, we just need a crowd. We’ve not found this difficult. In a year since going live we’ve had over 900 people sign up to take part. The crowd have screened over 110,000 citations and identified 4,000 reports of RCTs.
We’ve been really pleased with those metrics; personally, I’m not surprised by them but I do get a lot of people asking me: why do people do it? I don’t think there’s one answer;
I think many come to it knowing quite a bit about evidence based medicine and the pressures/challenges of producing timely and robust evidence, and they therefore want to help in this effort (and this provides a very real and immediate way they can help);
related to that point I think having made it very easy to contribute has played a significant part in our success. We’ve adopted a rapid onboarding approach; we also offer rapid disembarking – you can stop doing this whenever you want, you are under no obligation and no pressure – this is to fit around you, not the other way round;
and then two others factors for which I’m well aware we haven’t realised to their full potential, (more related to keeping people doing it) – and that’s around gaining experience, getting feedback on your performance, and perhaps being able to offer some progression or more tailored rewards.
So we’ve managed to recruit a crowd and they collectively screened well over 100,000 citations. How do we ensure quality? How to ensure that the records are ending up in the right place (CENTRAL for RCTs), the ‘bin’ for Rejects? We use a simple, yet robust algorithm which goes like this: three consecutive agreements on a record sends that record off to Central or the bin without further intervention. Any disagreements or any records classified as Unsure go into a pot for a Resolver level screener to resolve. Happily this constitutes only about 5% of all records screened.
So how well has this algorithm performed? We’ve performed two validation studies so far and two more are underway. Each of the those involved taking a random sample of crowd screened records and performing a re-screen on those records by ‘experts screeners’ blind to the crowd decisions. In both validation studies, crowd sensitivity (so the crowds ability to identify all the RCTs) and the crowd’s specificity (the crowd’s ability to identify records not eligible for central) has come out at over 99%. We’re happy with that.
So we are pleased with accuracy. What about volume? Are the crowd screening enough? Collectively we are speeding up. [explain graph] This is an exciting place to be….
It means we can:
Look to increase the number of databases we search and screen in this way (ie. using crowd), and
We can provide the crowd with more tasks aimed at contributing significantly to helping identify trials in a much more timely way, with no compromise on quality or accuracy.
And in any such task, the machine plays a vital and complementary role..
as we discover more about the very real role that text mining has to play, we can start to see further efficiencies reached by using both crowd and machine in a way that plays to each one’s strengths: the machine generating the probabilities and the crowd making the accurate collective decisions.