Paper at the ACM Multimedia 2016 Brave New Ideas Session on Societal Impact of Multimedia Research:
Alexis Joly, Hervé Goëau, Julien Champ, Samuel Dufour-Kowalski, Henning Müller, and Pierre Bonnet. 2016. Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet. In Proceedings of the 2016 ACM on Multimedia Conference (MM '16). ACM, New York, NY, USA, 958-967.
Paper: https://hal-lirmm.ccsd.cnrs.fr/hal-01373762/document
Pl@ntNet app:
https://play.google.com/store/apps/details?id=org.plantnet&hl=en
2. 2
• Global warming, food crisis and biodiversity erosion
• Accurate knowledge of living species distribution and
evolution is essential
• Ultimate goal: sustainable and global biodiversity
monitoring tools
– Surveillance of global warming consequences, plant & animal diseases,
human activities impact, invasive species propagation
• The Taxonomic impediment
– Less and less people can identify plants and animals
– Less and less nature observers can produce biodiversity data
Context
3. Pl@ntNet project (launched 2010)
Bridging the taxonomic impediment thanks to an innovative
crowdsourcing workflow based on automated plant identification
5. Pl@ntNet app today2,5 M downloads
14 M sessions
10-50 K users / day
150 Countries
5
Languages
FR, EN, ES, IT, PT,
DE, AR, ZH, SK
6. Pl@ntNet data
Validated data = 3% of the queried plant images
- 30K collaboratively revised observations per year (TelaBotanica)
- Publicly available through international initiatives (GBIF, LifeCLEF)
- Validation is a slow and hard process
7. Pl@ntNet data
Unlabeled data = 97% of the raw query stream
- > 1 Million of observations per year (5.1M today)
- Not exploited today
- A high potential for biodiversity monitoring
9. Species Distribution Modelling from UGC
image streams ?
Can we predict (real-time and/or long-term) Species Distribution Models directly
from Pl@ntNet mobile search logs ?
Or from any other UGC image stream ?
9
11. Recognizing plants in an open world
11
An open-set recognition problem
- With 10K’s of known and unknown classes
- Highly imbalanced training data
We carried out an evaluation within LifeCLEF 2016
- Training set of 1000 known species (113K pictures)
- Test set = 8K manually annotated Pl@ntNet queries (half
known, half distractors)
- Classification Mean Average Precision on a subset of 26
invasive species
??
? ? ?
? ?
12. 1. Improve automatic recognition of plants in open-world streams
- Novelty affects all systems, whatever the used rejection method (even supervised)
- No rejection method can deal with strong novelty rates
→ we are still far from being able to monitor biodiversity in Twitter or Snapchat streams !
12
Recognizing plants in an open world
14. Geo-location and date ?
- Not so easy !
- No real success within 5 years of PlantCLEF challenge
- Why ?
- Plant distributions are not well known (this is actually our objective !)
- Habitats are extremely heterogeneous from a species to another one (some
plants live everywhere while others live in very specific biotopes)
- What can we do ?
- Big occurrence data (like GBIF) might help but is biased, heterogeneous and
incomplete (no absence data)
- Environmental variables might help but heterogeneous, incomplete, noisy, etc.
→ This will be one of the focus of LifeCLEF 2017
16. Using taxonomy ?
Taxonomy = a hierarchical classification built by botanists for hundreds of years
→ 600 families > 14K genus > 300K species
But, taxonomy is highly heterogeneous and imbalanced
→ Classical hierarchical classification algorithms
can be not be directly used
- Some genus with up to 1000 very similar species
- But many genus and families include very distinct species
- The long tail distribution occurs at each level and in each
node
Genus
Orobanche
Genus
Bupleurum
Family
Bupleurum
17. Challenges
1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
4. Optimize and boost training data production
17
21. Challenges
1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
4. Optimize and boost data validation processes
5. Control bias in Species Distribution Models
21
22. 22
Objectif: Estimate the relative abundance Aij
of species i in place j supposing
Nij
~ Law( Aij
, Bij
)
Nij
: Number of observations of i in j
Aij
: Abundance of i in j
Bij
: Bias that might be complex because of the diversity of contributors, the opportunistic property of
the observations and the confusions
Modeling bias factors ?
23. Conclusion: biodiversity
informatics needs MM
23
Biodiversity
Dimension
Biodiversity Conservation
Challenge
Who? Multimedia research topics
Aesthetic Enjoy and love it Everybody IR, Recommendation
Diverse Identify and classify Taxonomists Multimodal & Large-scale classification
Complex Decipher & model Biologists Multimedia Data analytics
Unknown Discover & associate Taxonomists Multimedia Data mining
Endangered Define & implement policies Decision makers Visualization, Interactivity
Indispensable Use sustainably Everybody Cross-media streams monitoring