VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
Bioschemas: Marking up biodiversity websites to improve data discovery and web-scale integration
1. 1
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas:
Marking up biodiversity websites to improve
data discovery and web-scale integration
* Wimmics: AI in bridging social semantics and formal semantics on the Web
TDWG Webinar, 2021-03-10
Franck MICHEL*
Bioschemas Community http://bioschemas.org/people/
2. 2
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Semantic markup for web pages
4. 4
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Collaborative community project founded in 2011 by
Define a common vocabulary to markup resources on the internet
- Structured data makes resources understandable to search engines
- Improve ranking, discoverability
- Provide informative summarizations
: semantic markup for resources on the internet
schema.org
Microdata
RDFa
Microformats
Markup formats
5. 5
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Collaborative community project founded in 2011 by
Define a common vocabulary to markup resources on the internet
- Structured data makes resources understandable to search engines
- Improve ranking, discoverability
- Provide informative summarizations
Microdata
RDFa
: semantic markup for resources on the internet
Microformats
schema.org
Source: https://w3techs.com/technologies/history_overview/structured_data/all
Markup formats
6. 6
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
What we are
talking about:
types (778)
What we can say
about those things:
properties (1369)
: semantic markup for resources on the internet
schema.org
http://schema.org/Person
7. 7
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Webpages
How to share your biodiversity data?
Web API
Linked Data KG
Integrative approach
GBIF, EoL, iDigBio…
simple sophisticated
Flat files
8. 8
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How to share your biodiversity data?
Web API
Linked Data KG
Integrative approach
GBIF, EoL, iDigBio…
simple sophisticated
Webpages Flat files
9. 9
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas: schema.org extension for LifeSciences
Community initiative built on top of Schema.org
Aim
Help search engines understand and index webpages
Improve resources discoverability and interoperability
Approach
Reuse/extend Schema.org for life sciences
Keep it simple (no complex domain ontology)
Provide guidelines on how to markup resources
• Minimum/recommended/optional properties
• Link to other vocabularies & domain ontologies
Flexibility: recommandations, not constraints
Support software
Specification
Data model
Minimum information
Controlled vocabularies
Cardinality
Documentation
Examples
New (properties | types)
10. 10
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Currently defined terms
‒ ChemicalSubstance
‒ DataCatalog
‒ Dataset
‒ Gene
‒ MolecularEntity
‒ Protein
‒ Sample
‒ Taxon
More terms to come
‒ BioSample
‒ ComputationalTool
‒ ComputationalWorkflow
‒ LabProtocol
‒ Phenotype
‒ ProteinStructure
‒ RNA
‒ TaxonName
‒ …
Bioschemas: schema.org extension for LifeSciences
11. 11
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Taxon Type: https://bioschemas.org/types/Taxon meant to become http://schema.org/Taxon
Profile: https://bioschemas.org/profiles/Taxon provides usage recommendations
dwc:vernacularName
12. 12
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Example markup of a page about taxon Delphinapterus leucas
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type" : "Taxon",
"name": "Delphinapterus leucas (Pallas, 1776)",
"taxonRank": "species"
}
</script>
14. 14
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Example markup of a page about taxon Delphinapterus leucas
<script type="application/ld+json">
{
...
"sameAs": [
"http://doris.ffessm.fr/Especes/Delphinapterus-leucas-Beluga-868",
"http://www.marinespecies.org/aphia.php?p=taxdetails&id=137115",
"http://www.iucnredlist.org/details/6335"
],
"identifier": [
"60932",
{ "@type": "PropertyValue",
"name": "WoRMS id",
"propertyID": "http://www.wikidata.org/entity/P850", # WoRMS id
"value": "137115"
}
],
}
</script>
15. 15
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
What about names registries such as
IPNI,Zoobank, Mycobank?
Photo: https://commons.wikimedia.org/wiki/File:Name_label.JPG
16. 16
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
TaxonName
Taxon vs. TaxonName, discussion:
https://github.com/BioSchemas/specifications/issues/309
Taxon
Type: https://bioschemas.org/types/TaxonName meant to become http://schema.org/TaxonName
Profile: https://bioschemas.org/profiles/TaxonName provides usage recommendations
18. 18
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Live
deployments
Photo: https://www.flickr.com/photos/35034363287@N01/2284904309
19. 19
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Early deployment at NMNH Paris
https://search.google.com/structured-data/testing-tool
180,000+ pages marked up with
Taxon & TaxonName types
https://inpn.mnhn.fr/espece/cd_nom/60932
20. 20
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
NMNH Paris
Taxon & TaxonName, 180K pages
GBIF
Taxon & TaxonName, 3M pages
Scholia
Taxon, 2.7M pages
Scientific bibliographic information based on Wikidata
PIPPA
PSB Int. for Plant Phenotype Analysis
Taxon ↔ BioChemEntity
OpaleSurfCasting.net
Taxon
French leisure sea fishing legislation.
Why do early deployments matter?
• A way for the community to show its
interest in having these terms
• Necessary for Schema.org to endorse
new types
• First step to foster novel applications
(chicken & egg)
Early deployments https://bioschemas.org/liveDeploys/
https://scholia.toolforge.org
21. 21
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Next steps
22. 22
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas work on biodiversity
Currently:
Taxon, TaxonName
Links to DwC terms
Future
Specimen
Links to ABCD, openDS, MIDS?
Traits
Links to traits ontologies?
Occurrence
Links to DwC occurrences?
…
https://bioschemas.org/groups/Biodiversity/
23. 23
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Marking up biodiversity resources… at scale
GBIF, EoL, CoL, iDigBio, DiSSCo…
Museum collections,
Literature (BHL, Plazi…),
Citizen science platforms,
Independent institutions,
Associations,
…
24. 24
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Marking up webpages Let’s have search engines
do the job for us!
• Connect pieces of data at web scale
• First step for data integration is discovery
• Dataset search engines
• What about a Species Search Engine?
• …
Take-aways
• Increases data visibility and discoverability
• Relatively inexpensive
• Connect unconnected pieces of data,
e.g. “grey literature”
Not the magic bullet
• Names discrepancies
• Compliance with nomenclature
• How to name taxonomic ranks
• …
25. 25
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
https://bioschemas.org/
https://github.com/BioSchemas/specifications/wiki
Questions?