Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

http://www.plantnet-project.org/
Crowdsourcing Biodiversity
Monitoring: How Sharing your Photo
Stream can Sustain our Planet
1
Alexis Joly, Hervé Goëau, Julien Champ, Samuel Dufour-Kowalski,
Henning Müller, Pierre Bonnet
Acknowledgement: Nozha Boujemaa, Daniel Barthelemy,
Jean-François Molino

2
• Global warming, food crisis and biodiversity erosion
• Accurate knowledge of living species distribution and
evolution is essential
• Ultimate goal: sustainable and global biodiversity
monitoring tools
– Surveillance of global warming consequences, plant & animal diseases,
human activities impact, invasive species propagation
• The Taxonomic impediment
– Less and less people can identify plants and animals
– Less and less nature observers can produce biodiversity data
Context

Pl@ntNet project (launched 2010)
Bridging the taxonomic impediment thanks to an innovative
crowdsourcing workflow based on automated plant identification

The positive feedback loop does work !
+
+
+
Pl@ntNet project (launched 2010)

Pl@ntNet app today2,5 M downloads
14 M sessions
10-50 K users / day
150 Countries
5
Languages
FR, EN, ES, IT, PT,
DE, AR, ZH, SK

Pl@ntNet data
Validated data = 3% of the queried plant images
- 30K collaboratively revised observations per year (TelaBotanica)
- Publicly available through international initiatives (GBIF, LifeCLEF)
- Validation is a slow and hard process

Pl@ntNet data
Unlabeled data = 97% of the raw query stream
- > 1 Million of observations per year (5.1M today)
- Not exploited today
- A high potential for biodiversity monitoring

Species Distribution Modelling from UGC
image streams ?
Can we predict (real-time and/or long-term) Species Distribution Models directly
from Pl@ntNet mobile search logs ?
Or from any other UGC image stream ?
9

Challenges
1. Improve recognition in open-world streams
10

Recognizing plants in an open world
11
An open-set recognition problem
- With 10K’s of known and unknown classes
- Highly imbalanced training data
We carried out an evaluation within LifeCLEF 2016
- Training set of 1000 known species (113K pictures)
- Test set = 8K manually annotated Pl@ntNet queries (half
known, half distractors)
- Classification Mean Average Precision on a subset of 26
invasive species
??
? ? ?
? ?

1. Improve automatic recognition of plants in open-world streams
- Novelty affects all systems, whatever the used rejection method (even supervised)
- No rejection method can deal with strong novelty rates
→ we are still far from being able to monitor biodiversity in Twitter or Snapchat streams !
12
Recognizing plants in an open world

Challenges
2. Use geo-location and date
13

Geo-location and date ?
- Not so easy !
- No real success within 5 years of PlantCLEF challenge
- Why ?
- Plant distributions are not well known (this is actually our objective !)
- Habitats are extremely heterogeneous from a species to another one (some
plants live everywhere while others live in very specific biotopes)
- What can we do ?
- Big occurrence data (like GBIF) might help but is biased, heterogeneous and
incomplete (no absence data)
- Environmental variables might help but heterogeneous, incomplete, noisy, etc.
→ This will be one of the focus of LifeCLEF 2017

Challenges
3. Use taxonomy
15

Using taxonomy ?
Taxonomy = a hierarchical classification built by botanists for hundreds of years
→ 600 families > 14K genus > 300K species
But, taxonomy is highly heterogeneous and imbalanced
→ Classical hierarchical classification algorithms
can be not be directly used
- Some genus with up to 1000 very similar species
- But many genus and families include very distinct species
- The long tail distribution occurs at each level and in each
node
Genus
Orobanche
Genus
Bupleurum
Family
Bupleurum

Challenges
3. Use taxonomy
4. Optimize and boost training data production
17

Pro-active crowdsourcing
Classifier (CNN)
Annotators (heterogeneous skills)
Tasks selection &
assignment
?
?
?

Training
Training
2. Create
quizzes by
Monte-carlo
sampling
Beginner
Intermediate
1. ConvNet predictions
3. Sort quizzes by difficulty (= success
expectation across all workers)

Identification
success rate
Experiments: Simpson’s paradox
20
Declared expertise
Workers are assigned tasks they have been trained on before !

Challenges
3. Use taxonomy
4. Optimize and boost data validation processes
5. Control bias in Species Distribution Models
21

22
Objectif: Estimate the relative abundance Aij
of species i in place j supposing
Nij
~ Law( Aij
, Bij
)
Nij
: Number of observations of i in j
Aij
: Abundance of i in j
Bij
: Bias that might be complex because of the diversity of contributors, the opportunistic property of
the observations and the confusions
Modeling bias factors ?

Conclusion: biodiversity
informatics needs MM
23
Biodiversity
Dimension
Biodiversity Conservation
Challenge
Who? Multimedia research topics
Aesthetic Enjoy and love it Everybody IR, Recommendation
Diverse Identify and classify Taxonomists Multimodal & Large-scale classification
Complex Decipher & model Biologists Multimedia Data analytics
Unknown Discover & associate Taxonomists Multimedia Data mining
Endangered Define & implement policies Decision makers Visualization, Interactivity
Indispensable Use sustainably Everybody Cross-media streams monitoring

Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Recommandé

Recommandé

Contenu connexe

Similaire à Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

Similaire à Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet (20)

Dernier

Dernier (20)

Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet