SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
1
Finding Small Molecules
in Big Data
Emma Schymanski
Luxembourg Centre for Systems Biomedicine (LCSB),
University of Luxembourg
Email: emma.schymanski@uni.lu
Antony J. Williams
National Centre for Computational Toxicity (NCCT),
US Environmental Protection Agency (US EPA), NC, USA
Image © www.seanoakley.com/
The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency.
2
Small molecules … big problems?
E. coli data provided by N. Zamboni, IMSB, ETH Zürich.
o Status quo of small molecules:
• How many are in compound databases?
• How many could there be?
• How many are in spectral libraries?
• Communicating between libraries
o Exchanging “expert knowledge”
• Suspect Lists in Europe
• Live, retrospective screening & untargeted MS
o Tackling Complex Structures
• Exchanging Information on Unknowns
• …and how Open Science helps!
Cells
3
Searching for Small Molecules …
o Compound Databases
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
PubChem: >95 million
https://pubchem.ncbi.nlm.nih.gov/
ChemSpider: >60 million
http://www.chemspider.com/
CompTox Dashboard: >761 000
https://comptox.epa.gov/dashboard/
Human Metabolome DB (HMDB): >115 000
http://www.hmdb.ca/
4
Searching for Small Molecules …
o Compound Databases … isn’t 95 million enough?
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
Quick answer … NO!
E. coli data :N. Zamboni, IMSB, ETH Zürich
in silico prediction
5
Searching for More Small Molecules …
o In silico metabolite prediction – example of MINE (2015)
Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1
KEGG MINE
13,307 => 571,368
EcoCyc MINE
1,832 => 54,719
YMDB MINE
1,978 => 100,755
HMDB [15] MINE
23,035 => 400,414
6
Searching for More Small Molecules …
o In silico metabolite prediction – example of MINE (2015)
• First generation only … combinatorial explosion!
Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1
KEGG MINE
13,307 => 571,368
EcoCyc MINE
1,832 => 54,719
YMDB MINE
1,978 => 100,755
HMDB [15] MINE
23,035 => 400,414
Speculation …
PubChem MINE
95 million => 1.6 billion … first generation only?!?!
7
Searching for EVEN MORE Small Molecules …
Source: A. Kerber, R. Laue, M. Meringer, C. Rücker (2005) MATCH 54 (2), 301-312.
o Structure Generation
• But of course most of these do not exist
Molecular Mass
NumberofStructures
50 70 90 110 130 150
1100100001000000100000000
NIST MS LibraryNIST MS Library
Beilstein Registry
NIST MS Library
Beilstein Registry
Molecular Graphs
Structure Generation
100 million at MW 150
NIST MS Library
~1-200 at MW 150
Spectral Libraries
8
Searching for Small Molecules in Spectral Libraries
o … to find what is “on record”…
• Too many different MS/MS libraries (and they are still too small)
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
9
Do we need all these libraries?
o Yes … most libraries still have many unique entries
Vinaixa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005
= HMDB,
GNPS,
MassBank,
ReSpect
Compound lists
provided by:
S. Stein, R. Mistrik, Agilent
10
SPLASH – Communicate between libraries
SPectraL hASH – an identifier for mass spectra
splash10 - 0002 - 0900000000 - b112e4e059e1ecf98c5f
[version] - [top10] - [histogram] - [hash of full spectrum]
http://mona.fiehnlab.ucdavis.edu/#/spectra/splash/splash10-0002-0900000000-b112e4e059e1ecf98c5f
https://www.google.ch/search?q=splash10-0002-0900000000-b112e4e059e1ecf98c5f
http://splash.fiehnlab.ucdavis.edu/
Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689
11
SPLASH – Communicate between libraries
http://splash.fiehnlab.ucdavis.edu/
Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689
Source: www.massbank.eu; http://gnps.ucsd.edu/, www.drugbank.ca; www.mzcloud.org
12
SPLASH – Communicate between libraries
Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689
splash10 - 0002 - 0900000000 - b112e4e059e1ecf98c5f
[version] - [top10] - [histogram] - [hash of full spectrum]
http://mona.fiehnlab.ucdavis.edu/#/spectra/splash/splash10-0002-0900000000-b112e4e059e1ecf98c5f
https://www.google.ch/search?q=splash10-0002-0900000000-b112e4e059e1ecf98c5f
13
Small molecules … big problems?
E. coli data provided by N. Zamboni, IMSB, ETH Zürich.
o Status quo of small molecules:
• How many are in compound databases?
• How many could there be?
• How many are in spectral libraries?
• Communicating between libraries
o Exchanging “expert knowledge”
• Suspect Lists in Europe
• Live, retrospective screening & untargeted MS
o Tackling Complex Structures
• Exchanging Information on Unknowns
• …and how Open Science helps!
Cells
14
2015: European Non-target Screening Trial
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
Croatian
Water
RWS
15
European (World-)Wide Exchange of Suspects
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365
NORMAN Suspect List Exchange:
http://www.norman-network.com/?q=node/236
16
NORMAN Suspect List Exchange
o http://www.norman-network.com/?q=node/236
Schymanski, Aalizadeh et al. in prep; https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics
ReferencesFull Lists
17
o Now 21 lists available online … from small to large!
• Specialist collections (e.g. NormaNEWS) to market lists
• Integrated into the CompTox Chemistry Dashboard
NORMAN Suspect Exchange Lists
18
NORMAN Lists => CompTox Dashboard
https://comptox.epa.gov/dashboard/chemical_lists/normanews
http://www.norman-network.com/?q=node/236
https://comptox.epa.gov/dashboard/chemical_lists/normanews
19
NORMAN Digital Sample Freezing Platform
The next step: “Live” retrospective screening of known and
unknown chemicals in European samples (various matrices)
Aligizakis et al, in prep.
20
NORMAN Suspects don’t all have structures!
21
Small molecules … big problems?
E. coli data provided by N. Zamboni, IMSB, ETH Zürich.
o Status quo of small molecules:
• How many are in compound databases?
• How many could there be?
• How many are in spectral libraries?
• Communicating between libraries
o Exchanging “expert knowledge”
• Suspect Lists in Europe
• Live, retrospective screening & untargeted MS
o Tackling Complex Structures
• Exchanging Information on Unknowns
• …and how Open Science helps!
Cells
22
We still have many unknowns …
(l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich.
Environment
Cells
23
…and many are interconnected by mass
Schymanski et al. 2014, ES&T, DOI: 10.1021/es4044374; M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z
Homologous Series in environmental and biological samples ….
S OO
OH
CH3
CH3
m
n
C9H19
O
O
S
O
O
OHm
Lipid extract data of Mycobacterium smegmatis provided by N. Zamboni, IMSB, ETHZ
24
New Ways to Store and Access Homologues
Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID3020041
https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
https://comptox.epa.gov/dashboard/
chemical_lists/eawagsurf
25
Confidence Levels for Tentative Structures
Schymanski, Jeon, Gulde, Fenner, Ruff, Singer & Hollender (2014) ES&T, 48 (4), 2097-2098. DOI: 10.1021/es5002105
o Annotation is the key to communicating confidence!
MS, MS2, RT, Reference Std.
Level 1: Confirmed structure
by reference standard
Level 2: Probable structure
a) by library spectrum match
b) by diagnostic evidence
Identification confidence
N
N
N
NHNH
CH3
CH3
S
CH3
OH
MS, MS2, Library MS2
MS, MS2, Exp. data
Example Minimum data requirements
Level 4: Unequivocal molecular formula
Level 5: Exact mass of interest
C6H5N3O4
192.0757
MS isotope/adduct
MS
Level 3: Tentative candidate(s)
structure, substituent, class MS, MS2, Exp. data
26
Annotated Spectra of Homologues in MassBank.EU
Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131
OHSO
O
CH3
O
OH
m n
SPA-9C
m+n=6
www.massbank.eu ACCESSIONS (LAS, SPACs):
Literature MS/MS LIT00034, LIT00037
Std Mix., Sample ETS00012, ETS00018https://github.com/MassBank/RMassBank/
Tentatively Identified Spectra:
http://goo.gl/0t7jGp
27
European (World-)Wide Exchange of Suspects
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7;
NORMAN Suspect List Exchange:
http://www.norman-network.com/?q=node/236
Tentatively Identified Spectra:
http://goo.gl/0t7jGp
Hits in GNPS MassIVE datasets:
TPs in skin: http://goo.gl/NmO4tx
Surfactants: http://goo.gl/7sY9Pf
28
Small molecules … big problems OPPORTUNITIES!
o Identifying small molecules requires information from many sources
• Extensive compound databases available
• Many mass spectral libraries available
• Many excellent workflows available to collate this information
• Community initiatives to improve communication between resources
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
http://splash.fiehnlab.ucdavis.edu/
29
Small molecules … big problems OPPORTUNITIES!
o Identifying small molecules requires information from many sources
• Extensive compound databases available
• Many mass spectral libraries available
• Many excellent workflows available to collate this information
• Community initiatives to improve communication between resources
o Exchanging expert knowledge worldwide
• Community efforts contribute greatly to improved cross-annotation
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365
30
Small molecules … big problems OPPORTUNITIES!
o Identifying small molecules requires information from many sources
• Extensive compound databases available
• Many mass spectral libraries available
• Many excellent workflows available to collate this information
• Community initiatives to improve communication between resources
o Exchanging expert knowledge worldwide
• Community efforts contribute greatly to improved cross-annotation
o Tackling complex structures
• Huge progress in cheminformatics
approaches in very short time …
https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics
31
Small molecules … big problems OPPORTUNITIES!
o Identifying small molecules requires information from many sources
• Extensive compound databases available
• Many mass spectral libraries available
• Many excellent workflows available to collate this information
• Community initiatives to improve communication between resources
o Exchanging expert knowledge worldwide
• Community efforts contribute greatly to improved cross-annotation
o Tackling complex structures
• Huge progress in cheminformatics
approaches in very short time …
• Information in the public domain
helps everyone!
(you never know when it will help you!)
Open Science Viewpoint: Schymanski & Williams, 2017, ES&T, 51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
32
Acknowledgements I
emma.schymanski@uni.lu
Further Information:
http://www.norman-network.com/?q=node/236
https://massbank.eu/MassBank/
https://comptox.epa.gov/dashboard/
https://www.researchgate.net/project/Supporting-Mass-
Spectrometry-Through-Cheminformatics
https://github.com/MassBank/
.eu
2.3
EU Grant
603437
Stellan
Fischer
KEMI
33
34
Searching for More Small Molecules …
http://eawag-bbd.ethz.ch/predict/aboutPPS.html
o In silico prediction and combinatorial explosion [UM-PPS]
↓
Eawag-PPS
↓
[enviPath]
35
Searching for EVEN MORE Small Molecules …
Source: A. Kerber, R. Laue, M. Meringer, C. Rücker (2005) MATCH 54 (2), 301-312.
o Structure Generation
• But of course most of these do not exist
Structures:
- elements C, H, N, O
- at least 1 C atom
- standard valencies
- no charges
- no radicals
- no stereoisomers
- only connected
structures
Molecular Mass
NumberofStructures
50 70 90 110 130 150
1100100001000000100000000
NIST MS LibraryNIST MS Library
Beilstein Registry
NIST MS Library
Beilstein Registry
Molecular Graphs
36
Metadata & The Chemical Identity Crisis
Schymanski & Williams, 2017, ES&T51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
37
MetFrag2.3: Non-target Identification
Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9
Status: 2010 => 2016
5 ppm
0.001 Da
mz [M-H]-
213.9637
ChemSpider
or
PubChem± 5 ppm
2.3
RT: 4.54 min
355 InChI/RTs
References
External Refs
Data Sources
RSC Count
PubMed Count
Suspect Lists
MS/MS
134.0054 339689
150.0001 77271
213.9607 632466
Elements: C,N,S
S OO
OH
38
Automating Confidence Levels
Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
39
RMassBank: More than “just” MassBank
Automatic MS and MS/MS
Recalibration and Clean-up
Remove interfering peaks
Spectral Annotation with
- Experimental Details
- Compound Information
https://github.com/MassBank/RMassBank/
http://bioconductor.org/packages/RMassBank/
MS/MS for
further
processing

Contenu connexe

Tendances

ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown Metabolites
ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown MetabolitesASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown Metabolites
ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown MetabolitesEmma Schymanski
 
ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking
ASMS Fall 2018 Metabolomics Informatics Workshop Peak PickingASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking
ASMS Fall 2018 Metabolomics Informatics Workshop Peak PickingEmma Schymanski
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...Valery Tkachenko
 

Tendances (20)

ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown Metabolites
ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown MetabolitesASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown Metabolites
ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown Metabolites
 
Overview of open resources to support automated structure verification and e...
Overview of open resources to support automated structure verification  and e...Overview of open resources to support automated structure verification  and e...
Overview of open resources to support automated structure verification and e...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking
ASMS Fall 2018 Metabolomics Informatics Workshop Peak PickingASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking
ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 

Similaire à Small Molecules in Big Data - Analytica Munich

Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHbiocs
 
A statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelA statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelShashaanka Ashili
 
The Role of Retention Time in Untargeted Metabolomics
The Role of Retention Time in Untargeted MetabolomicsThe Role of Retention Time in Untargeted Metabolomics
The Role of Retention Time in Untargeted Metabolomics Jan Stanstrup
 
Building an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomicsBuilding an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomicsChristoph Steinbeck
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
We need to solve more that just our access problems
We need to solve more that just our access problemsWe need to solve more that just our access problems
We need to solve more that just our access problemsBjörn Brembs
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Leibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific NotationLeibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific Notationkhinsen
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018David Cook
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisAntica Culina
 
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...taxonbytes
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSMaaike Duine
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 

Similaire à Small Molecules in Big Data - Analytica Munich (20)

Curating and Sharing Structures and Spectra for the Environmental Community
Curating and Sharing  Structures and Spectra for the Environmental CommunityCurating and Sharing  Structures and Spectra for the Environmental Community
Curating and Sharing Structures and Spectra for the Environmental Community
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
 
A statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelA statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell level
 
Non-targeted screening to improve substance identity for Chemical Substances ...
Non-targeted screening to improve substance identity for Chemical Substances ...Non-targeted screening to improve substance identity for Chemical Substances ...
Non-targeted screening to improve substance identity for Chemical Substances ...
 
The Role of Retention Time in Untargeted Metabolomics
The Role of Retention Time in Untargeted MetabolomicsThe Role of Retention Time in Untargeted Metabolomics
The Role of Retention Time in Untargeted Metabolomics
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Building an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomicsBuilding an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomics
 
Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls
Automated Structure Annotation and Curation for MassBank: Potential and PitfallsAutomated Structure Annotation and Curation for MassBank: Potential and Pitfalls
Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
We need to solve more that just our access problems
We need to solve more that just our access problemsWe need to solve more that just our access problems
We need to solve more that just our access problems
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Leibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific NotationLeibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific Notation
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
Kernel-based machine learning methods
Kernel-based machine learning methodsKernel-based machine learning methods
Kernel-based machine learning methods
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 

Dernier

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermicultureTakeleZike1
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 

Dernier (20)

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermiculture
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 

Small Molecules in Big Data - Analytica Munich

  • 1. 1 Finding Small Molecules in Big Data Emma Schymanski Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg Email: emma.schymanski@uni.lu Antony J. Williams National Centre for Computational Toxicity (NCCT), US Environmental Protection Agency (US EPA), NC, USA Image © www.seanoakley.com/ The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency.
  • 2. 2 Small molecules … big problems? E. coli data provided by N. Zamboni, IMSB, ETH Zürich. o Status quo of small molecules: • How many are in compound databases? • How many could there be? • How many are in spectral libraries? • Communicating between libraries o Exchanging “expert knowledge” • Suspect Lists in Europe • Live, retrospective screening & untargeted MS o Tackling Complex Structures • Exchanging Information on Unknowns • …and how Open Science helps! Cells
  • 3. 3 Searching for Small Molecules … o Compound Databases Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034 PubChem: >95 million https://pubchem.ncbi.nlm.nih.gov/ ChemSpider: >60 million http://www.chemspider.com/ CompTox Dashboard: >761 000 https://comptox.epa.gov/dashboard/ Human Metabolome DB (HMDB): >115 000 http://www.hmdb.ca/
  • 4. 4 Searching for Small Molecules … o Compound Databases … isn’t 95 million enough? Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034 Quick answer … NO! E. coli data :N. Zamboni, IMSB, ETH Zürich in silico prediction
  • 5. 5 Searching for More Small Molecules … o In silico metabolite prediction – example of MINE (2015) Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1 KEGG MINE 13,307 => 571,368 EcoCyc MINE 1,832 => 54,719 YMDB MINE 1,978 => 100,755 HMDB [15] MINE 23,035 => 400,414
  • 6. 6 Searching for More Small Molecules … o In silico metabolite prediction – example of MINE (2015) • First generation only … combinatorial explosion! Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1 KEGG MINE 13,307 => 571,368 EcoCyc MINE 1,832 => 54,719 YMDB MINE 1,978 => 100,755 HMDB [15] MINE 23,035 => 400,414 Speculation … PubChem MINE 95 million => 1.6 billion … first generation only?!?!
  • 7. 7 Searching for EVEN MORE Small Molecules … Source: A. Kerber, R. Laue, M. Meringer, C. Rücker (2005) MATCH 54 (2), 301-312. o Structure Generation • But of course most of these do not exist Molecular Mass NumberofStructures 50 70 90 110 130 150 1100100001000000100000000 NIST MS LibraryNIST MS Library Beilstein Registry NIST MS Library Beilstein Registry Molecular Graphs Structure Generation 100 million at MW 150 NIST MS Library ~1-200 at MW 150 Spectral Libraries
  • 8. 8 Searching for Small Molecules in Spectral Libraries o … to find what is “on record”… • Too many different MS/MS libraries (and they are still too small) Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
  • 9. 9 Do we need all these libraries? o Yes … most libraries still have many unique entries Vinaixa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005 = HMDB, GNPS, MassBank, ReSpect Compound lists provided by: S. Stein, R. Mistrik, Agilent
  • 10. 10 SPLASH – Communicate between libraries SPectraL hASH – an identifier for mass spectra splash10 - 0002 - 0900000000 - b112e4e059e1ecf98c5f [version] - [top10] - [histogram] - [hash of full spectrum] http://mona.fiehnlab.ucdavis.edu/#/spectra/splash/splash10-0002-0900000000-b112e4e059e1ecf98c5f https://www.google.ch/search?q=splash10-0002-0900000000-b112e4e059e1ecf98c5f http://splash.fiehnlab.ucdavis.edu/ Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689
  • 11. 11 SPLASH – Communicate between libraries http://splash.fiehnlab.ucdavis.edu/ Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689 Source: www.massbank.eu; http://gnps.ucsd.edu/, www.drugbank.ca; www.mzcloud.org
  • 12. 12 SPLASH – Communicate between libraries Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689 splash10 - 0002 - 0900000000 - b112e4e059e1ecf98c5f [version] - [top10] - [histogram] - [hash of full spectrum] http://mona.fiehnlab.ucdavis.edu/#/spectra/splash/splash10-0002-0900000000-b112e4e059e1ecf98c5f https://www.google.ch/search?q=splash10-0002-0900000000-b112e4e059e1ecf98c5f
  • 13. 13 Small molecules … big problems? E. coli data provided by N. Zamboni, IMSB, ETH Zürich. o Status quo of small molecules: • How many are in compound databases? • How many could there be? • How many are in spectral libraries? • Communicating between libraries o Exchanging “expert knowledge” • Suspect Lists in Europe • Live, retrospective screening & untargeted MS o Tackling Complex Structures • Exchanging Information on Unknowns • …and how Open Science helps! Cells
  • 14. 14 2015: European Non-target Screening Trial Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 Croatian Water RWS
  • 15. 15 European (World-)Wide Exchange of Suspects Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365 NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236
  • 16. 16 NORMAN Suspect List Exchange o http://www.norman-network.com/?q=node/236 Schymanski, Aalizadeh et al. in prep; https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics ReferencesFull Lists
  • 17. 17 o Now 21 lists available online … from small to large! • Specialist collections (e.g. NormaNEWS) to market lists • Integrated into the CompTox Chemistry Dashboard NORMAN Suspect Exchange Lists
  • 18. 18 NORMAN Lists => CompTox Dashboard https://comptox.epa.gov/dashboard/chemical_lists/normanews http://www.norman-network.com/?q=node/236 https://comptox.epa.gov/dashboard/chemical_lists/normanews
  • 19. 19 NORMAN Digital Sample Freezing Platform The next step: “Live” retrospective screening of known and unknown chemicals in European samples (various matrices) Aligizakis et al, in prep.
  • 20. 20 NORMAN Suspects don’t all have structures!
  • 21. 21 Small molecules … big problems? E. coli data provided by N. Zamboni, IMSB, ETH Zürich. o Status quo of small molecules: • How many are in compound databases? • How many could there be? • How many are in spectral libraries? • Communicating between libraries o Exchanging “expert knowledge” • Suspect Lists in Europe • Live, retrospective screening & untargeted MS o Tackling Complex Structures • Exchanging Information on Unknowns • …and how Open Science helps! Cells
  • 22. 22 We still have many unknowns … (l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich. Environment Cells
  • 23. 23 …and many are interconnected by mass Schymanski et al. 2014, ES&T, DOI: 10.1021/es4044374; M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z Homologous Series in environmental and biological samples …. S OO OH CH3 CH3 m n C9H19 O O S O O OHm Lipid extract data of Mycobacterium smegmatis provided by N. Zamboni, IMSB, ETHZ
  • 24. 24 New Ways to Store and Access Homologues Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6 https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID3020041 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf https://comptox.epa.gov/dashboard/ chemical_lists/eawagsurf
  • 25. 25 Confidence Levels for Tentative Structures Schymanski, Jeon, Gulde, Fenner, Ruff, Singer & Hollender (2014) ES&T, 48 (4), 2097-2098. DOI: 10.1021/es5002105 o Annotation is the key to communicating confidence! MS, MS2, RT, Reference Std. Level 1: Confirmed structure by reference standard Level 2: Probable structure a) by library spectrum match b) by diagnostic evidence Identification confidence N N N NHNH CH3 CH3 S CH3 OH MS, MS2, Library MS2 MS, MS2, Exp. data Example Minimum data requirements Level 4: Unequivocal molecular formula Level 5: Exact mass of interest C6H5N3O4 192.0757 MS isotope/adduct MS Level 3: Tentative candidate(s) structure, substituent, class MS, MS2, Exp. data
  • 26. 26 Annotated Spectra of Homologues in MassBank.EU Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 www.massbank.eu ACCESSIONS (LAS, SPACs): Literature MS/MS LIT00034, LIT00037 Std Mix., Sample ETS00012, ETS00018https://github.com/MassBank/RMassBank/ Tentatively Identified Spectra: http://goo.gl/0t7jGp
  • 27. 27 European (World-)Wide Exchange of Suspects Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236 Tentatively Identified Spectra: http://goo.gl/0t7jGp Hits in GNPS MassIVE datasets: TPs in skin: http://goo.gl/NmO4tx Surfactants: http://goo.gl/7sY9Pf
  • 28. 28 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034 http://splash.fiehnlab.ucdavis.edu/
  • 29. 29 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources o Exchanging expert knowledge worldwide • Community efforts contribute greatly to improved cross-annotation Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365
  • 30. 30 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources o Exchanging expert knowledge worldwide • Community efforts contribute greatly to improved cross-annotation o Tackling complex structures • Huge progress in cheminformatics approaches in very short time … https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics
  • 31. 31 Small molecules … big problems OPPORTUNITIES! o Identifying small molecules requires information from many sources • Extensive compound databases available • Many mass spectral libraries available • Many excellent workflows available to collate this information • Community initiatives to improve communication between resources o Exchanging expert knowledge worldwide • Community efforts contribute greatly to improved cross-annotation o Tackling complex structures • Huge progress in cheminformatics approaches in very short time … • Information in the public domain helps everyone! (you never know when it will help you!) Open Science Viewpoint: Schymanski & Williams, 2017, ES&T, 51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
  • 33. 33
  • 34. 34 Searching for More Small Molecules … http://eawag-bbd.ethz.ch/predict/aboutPPS.html o In silico prediction and combinatorial explosion [UM-PPS] ↓ Eawag-PPS ↓ [enviPath]
  • 35. 35 Searching for EVEN MORE Small Molecules … Source: A. Kerber, R. Laue, M. Meringer, C. Rücker (2005) MATCH 54 (2), 301-312. o Structure Generation • But of course most of these do not exist Structures: - elements C, H, N, O - at least 1 C atom - standard valencies - no charges - no radicals - no stereoisomers - only connected structures Molecular Mass NumberofStructures 50 70 90 110 130 150 1100100001000000100000000 NIST MS LibraryNIST MS Library Beilstein Registry NIST MS Library Beilstein Registry Molecular Graphs
  • 36. 36 Metadata & The Chemical Identity Crisis Schymanski & Williams, 2017, ES&T51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
  • 37. 37 MetFrag2.3: Non-target Identification Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Status: 2010 => 2016 5 ppm 0.001 Da mz [M-H]- 213.9637 ChemSpider or PubChem± 5 ppm 2.3 RT: 4.54 min 355 InChI/RTs References External Refs Data Sources RSC Count PubMed Count Suspect Lists MS/MS 134.0054 339689 150.0001 77271 213.9607 632466 Elements: C,N,S S OO OH
  • 38. 38 Automating Confidence Levels Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
  • 39. 39 RMassBank: More than “just” MassBank Automatic MS and MS/MS Recalibration and Clean-up Remove interfering peaks Spectral Annotation with - Experimental Details - Compound Information https://github.com/MassBank/RMassBank/ http://bioconductor.org/packages/RMassBank/ MS/MS for further processing