SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
1
Principles of Peak Picking and Alignment
Emma L. Schymanski
FNR ATTRACT Fellow and PI in Environmental Cheminformatics
Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg
Email: emma.schymanski@uni.lu
…and many colleagues who contributed to my science over the years!
ASMS Fall Meeting, San Francisco, California, November 29-30, 2018
Image©www.seanoakley.com/
https://tinyurl.com/asmsfall2018-peaks
How many peaks will a peak picker pick if a peak picker only picks peaks?
2
(nevertheless, I will do my best!)
DISCLAIMER!
MS1
MS2
Two very different worlds …
3
Presenting Peak Picking: Plan
o Why Peak Pick
o Terminology
• Peak Picking vs Centroid vs Profile …
o Peak Picking & Peak Pickers
• “best of” xcms and enviPick
• Peak Picking in Pictures
• Peak Picking Parameters
• Alleviating Peak Picking Parameter Panic
o Alignment ( / Profiling)
• “best of” xcms and enviMass
o Peak Picking Pointers
o Don’t just listen to me … do it!
4
Why Peak Pick (I)
Example scheme of liquid chromatography - mass spectrometry
Image © www.planetorbitrap.com/q-exactive
Sampling
Extraction (SPE)
HPLC separation
HR-MS/MS
5
Why Peak Pick (II)
This is what the output “really” looks like …
Image © www.planetorbitrap.com/q-exactive
6
Why Peak Pick (III)
Identification = turning numbers into structures
N
N
N
S
CH3
NHNH
CH3
CH3
CH3
N
N
N
S
CH3
NHNHCH3
CH3
OH
P
O
S
SO
CH3
CH3
CH3
P OHS
S
O
CH3
CH3
OH
CH3
S
O
O
OH
CH3
CH3
S
N
S
O
O
OH
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
N
N
N
S
NHNH
CH3
CH3
CH3
NH2
OH
O
massbank.eu
7
TERMINOLOGY!
o Peak picking can be multi-directional, i.e.
• in mass… or time…
8
Mass: Centroid vs Profile Data (enviPat)
https://www.envipat.eawag.ch/index.php and Loos et al Anal. Chem. 87(11), 5738-5744. DOI: 10.1021/acs.analchem.5b00941
9
Mass: Centroid vs Profile Data (enviPat)
https://www.envipat.eawag.ch/index.php and Loos et al Anal. Chem. 87(11), 5738-5744. DOI: 10.1021/acs.analchem.5b00941
10
TERMINOLOGY!
http://proteowizard.sourceforge.net/
o Peak picking can be multi-directional (mass, time)
• Peak picking in Proteowizard MSConvert is “centroiding” masses
(turning profile mode data into centroided data for efficient processing)
11
Peak Picking (in time)
Source: R. Tautenhahn, C. Böttcher, S. Neumann, BMC Bioinformatics 2008, 9:504. DOI: 10.1186/1471-2105-9-504
o Peak picking along time axis (chromatographic peaks)
12
Peak Picking
Source: R. Tautenhahn, C. Böttcher, S. Neumann, BMC Bioinformatics 2008, 9:504. DOI: 10.1186/1471-2105-9-504
o Peak picking along time axis (chromatographic peaks)
13
Peak Picking
Source: Johannes Rainer; http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html
o Peak picking along time axis (chromatographic peaks)
14
Peak Picking
Source: Johannes Rainer; http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html
o Peak picking along time axis (chromatographic peaks)
Several Samples Overlaid
Red = KO
Blue = wild type
Rectangle = chromatographic
peaks identified per sample
15
Peak Picking
o Several options for peak picking
• XCMS and centWave
• Tautenhahn et al 2008 DOI: 10.1186/1471-2105-9-504
• http://bioconductor.org/packages/xcms/
• MZmine 2
• Pluskal et al 2010 DOI: 10.1186/1471-2105-11-395
• http://mzmine.github.io/
• enviPick / enviMass
• Loos 2018 DOI: 10.5281/zenodo.1213098
• http://www.looscomputing.ch/eng/enviMass/overview.htm
• Plenty of other open, research and vendor options ...
16
Peak Picking
o Result is something like this (from Formulator output):
17
Peak Picking – XCMS & XCMS Online
o http://bioconductor.org/packages/xcms/
18
Peak Picking – XCMS & XCMS Online
o https://xcmsonline.scripps.edu/
19
Peak Picking – enviMass and enviPick
o http://www.looscomputing.ch/eng/enviMass/overview.htm
o R packages …
20
Peak Picking in Pictures
http://www.looscomputing.ch/eng/enviMass/topics/peakpicking.htm
Red = peaks
Grey = noise
21
Peak Picking .. Somewhat simpler picture
http://www.looscomputing.ch/eng/enviMass/topics/peakpicking.htm
22
centWave – Gaussian with “Mexican Hat”
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
23
centWave – Gaussian with “Mexican Hat”
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
24
centWave – Gaussian with “Mexican Hat”
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
25
But … peaks are not perfect!
http://www.looscomputing.ch/eng/enviMass/topics/peakpicking.htm
o See enviMass website for explanation …
26
Critical Point: Separating Peaks from Baseline
27
Peak Picking Parameters
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
o There are a lot of options to tweak!
• I will just run through (main) centWave parameters
• enviPick is too complicated => further reading!
28
Peak Picking Parameters: centWave
ppm maximal tolerated m/z deviation in consecutive scans, in
ppm (parts per million)
NOTE: dependent on your mass spectrometer
29
Peak Picking Parameters: centWave
peakwidth Chromatographic peak width, given as range (min,max) in seconds
NOTE: highly dependent on your chromatography!
30
Peak Picking Parameters: centWave
snthresh Signal to noise ratio cutoff
31
Peak Picking Parameters: centWave
prefilter prefilter=c(k,I). Prefilter step for the first phase. Mass traces are
only retained if they contain at least k peaks with intensity >= I
Only one “stick” so will
fail recommended prefilter
settings
32
Too Many Peak Picking Parameters ???????
https://bioconductor.org/packages/
release/bioc/vignettes/IPO/inst/doc
/IPO.html
o IPO to the rescue!
o Parameter
optimization for
xcms-based
workflows …
o Libiseller et al
2015, DOI:
10.1186/s12859-015-0562-8
IPO = Isotopologue Parameter Optimization
33
Too Many Peak Picking Parameters ???????
34
RECAP: Why Peak Pick?
Identification = turning numbers into structures
N
N
N
S
CH3
NHNH
CH3
CH3
CH3
N
N
N
S
CH3
NHNHCH3
CH3
OH
P
O
S
SO
CH3
CH3
CH3
P OHS
S
O
CH3
CH3
OH
CH3
S
O
O
OH
CH3
CH3
S
N
S
O
O
OH
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
N
N
N
S
NHNH
CH3
CH3
CH3
NH2
OH
O
massbank.eu
35
o Instruments change over time …
o Before we can do fancy statistics, we need to make sure
our samples are comparable!
36
Alignment
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#3_initial_data_inspection
o Alignment / Profiling => which peaks belong together
across large sample sets?
37
Alignment
http://www.looscomputing.ch/eng/enviMass/topics/profiling.htm
o “Profiling” in enviMass
38
Alignment ~= Retention Time Correction
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#3_initial_data_inspection
o Many algorithms and methods …
o Before:
39
Alignment ~= Retention Time Correction
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#5_alignment
o Many algorithms and methods …
o After (Obiwarp algorithm in xcms)
40
Before Alignment
After Alignment
41
Changes over samples
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#5_alignment
o Difference between adjusted and raw retention times
along the retention time axis
42
Some advice …
o Peak pickers are designed to pick the perfect peak
• But life is never perfect and peaks are no different
o Pick the peak picker that is best for your situation
• Convenience, ease of use, designed for your data, …
• The optimal choice is usually a compromise
o Be sceptical (visualise your data, reality check it, etc.)
• But don’t go overboard in evaluating peak pickers … remember
your (real) goal …
43
Peak Picking Overlap (centWave paper)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
44
Verify with EIC Extraction [these are NOT picked]
https://github.com/schymane/ReSOLUTION/blob/master/R/RMB_EIC_prescreen.R
No peak at all
Nice peak, MSMS
Peak, no MSMS
Noise with MSMS (careful!)
Isobars with MSMS (careful!)*
Looking for chemicals known
to be present in the sample
45
Just because you find a peak …
ENTACT Project: https://www.epa.gov/sites/production/files/2018-06/documents/comptox_cop_6-28-18.pdf
o Mix 505: One candidate with this mass/formula
• DTXSID9040001, C9H8O4
o One chemical…
How many
peaks?
46
…doesn’t mean it’s your compound of interest!
47
Beware of artefacts!
o Your results also depend on the acquisition data!
48
Further reading DOING! [Vendor independent]
o Don’t just take my word for it … don’t just read about it
… DO IT. There are so many ways to try it out …
complete with sample data! [Open Science!]
o http://bioconductor.org/packages/release/bioc/vignettes/x
cms/inst/doc/xcms.html
o http://www.looscomputing.ch/eng/enviMass/overview.htm
o An interface that many enjoy, likely comes with example
data but requires a login …
o https://xcmsonline.scripps.edu/
49
Further reading DOING! [Vendor independent]
o http://mzmine.github.io/
o http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/
o MS-DIAL
50
Acknowledgements
emma.schymanski@uni.lu
Further Information:
http://bioconductor.org/packages/xcms/
http://www.looscomputing.ch/eng/enviMass/overview.htm
https://xcmsonline.scripps.edu/
http://mzmine.github.io/
EU Grant
603437
The CompMS Community (proxy photo)
51
Extra Slides
52
Quality Control of Data
Slide c/o Michael Stravs
o Always visualise results … never take anything for granted
53
Homologues: Challenge Peak Pickers but are Present!
Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131
OHSO
O
CH3
O
OH
m n
SPA-9C
m+n=6
www.massbank.eu ACCESSIONS (LAS, SPACs):
Literature MS/MS LIT00034, LIT00037
Std Mix., Sample ETS00012, ETS00018https://github.com/MassBank/RMassBank/
Tentatively Identified Spectra:
http://goo.gl/0t7jGp
54
Be wary of instrument specific phenomena!
o R package nontarget: satellite peak removal
55
Be wary of instrument specific phenomena II
o Orbitrap-specific calibration issues (not observed in TOF)

Contenu connexe

Tendances

High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Printout webinar r ax costanza 05 05-2020
Printout webinar r ax costanza 05 05-2020Printout webinar r ax costanza 05 05-2020
Printout webinar r ax costanza 05 05-2020crovida
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesespetermurrayrust
 
Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.Nina Jeliazkova
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 

Tendances (20)

Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Printout webinar r ax costanza 05 05-2020
Printout webinar r ax costanza 05 05-2020Printout webinar r ax costanza 05 05-2020
Printout webinar r ax costanza 05 05-2020
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and theses
 
Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
Overview of open resources to support automated structure verification and e...
Overview of open resources to support automated structure verification  and e...Overview of open resources to support automated structure verification  and e...
Overview of open resources to support automated structure verification and e...
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 

Similaire à ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking

CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beikobeiko
 
Dr. gerald pfister challenges, solutions and innovations in modern flowcyto...
Dr. gerald pfister   challenges, solutions and innovations in modern flowcyto...Dr. gerald pfister   challenges, solutions and innovations in modern flowcyto...
Dr. gerald pfister challenges, solutions and innovations in modern flowcyto...Hitham Esam
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflowsBalázs Kégl
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesJan Aerts
 
CMSY workshop - Gianpaolo Coro (ISTI-CNR)
CMSY workshop - Gianpaolo Coro (ISTI-CNR)CMSY workshop - Gianpaolo Coro (ISTI-CNR)
CMSY workshop - Gianpaolo Coro (ISTI-CNR)Blue BRIDGE
 
Lec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine ImagesLec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine ImagesUlaş Bağcı
 
2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekingeProf. Wim Van Criekinge
 
Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Nicolò Paternoster
 
Lecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxLecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxNatKell
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsRaul Chong
 
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Robert (Rob) Salomon
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsProf. Wim Van Criekinge
 

Similaire à ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking (20)

CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beiko
 
Dr. gerald pfister challenges, solutions and innovations in modern flowcyto...
Dr. gerald pfister   challenges, solutions and innovations in modern flowcyto...Dr. gerald pfister   challenges, solutions and innovations in modern flowcyto...
Dr. gerald pfister challenges, solutions and innovations in modern flowcyto...
 
ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflows
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologies
 
CMSY workshop - Gianpaolo Coro (ISTI-CNR)
CMSY workshop - Gianpaolo Coro (ISTI-CNR)CMSY workshop - Gianpaolo Coro (ISTI-CNR)
CMSY workshop - Gianpaolo Coro (ISTI-CNR)
 
Lec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine ImagesLec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine Images
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Introduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEASTIntroduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEAST
 
2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
Lecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxLecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptx
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
 
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 

Dernier

EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationSanghamitraMohapatra5
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 

Dernier (20)

EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitation
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 

ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking

  • 1. 1 Principles of Peak Picking and Alignment Emma L. Schymanski FNR ATTRACT Fellow and PI in Environmental Cheminformatics Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg Email: emma.schymanski@uni.lu …and many colleagues who contributed to my science over the years! ASMS Fall Meeting, San Francisco, California, November 29-30, 2018 Image©www.seanoakley.com/ https://tinyurl.com/asmsfall2018-peaks How many peaks will a peak picker pick if a peak picker only picks peaks?
  • 2. 2 (nevertheless, I will do my best!) DISCLAIMER! MS1 MS2 Two very different worlds …
  • 3. 3 Presenting Peak Picking: Plan o Why Peak Pick o Terminology • Peak Picking vs Centroid vs Profile … o Peak Picking & Peak Pickers • “best of” xcms and enviPick • Peak Picking in Pictures • Peak Picking Parameters • Alleviating Peak Picking Parameter Panic o Alignment ( / Profiling) • “best of” xcms and enviMass o Peak Picking Pointers o Don’t just listen to me … do it!
  • 4. 4 Why Peak Pick (I) Example scheme of liquid chromatography - mass spectrometry Image © www.planetorbitrap.com/q-exactive Sampling Extraction (SPE) HPLC separation HR-MS/MS
  • 5. 5 Why Peak Pick (II) This is what the output “really” looks like … Image © www.planetorbitrap.com/q-exactive
  • 6. 6 Why Peak Pick (III) Identification = turning numbers into structures N N N S CH3 NHNH CH3 CH3 CH3 N N N S CH3 NHNHCH3 CH3 OH P O S SO CH3 CH3 CH3 P OHS S O CH3 CH3 OH CH3 S O O OH CH3 CH3 S N S O O OH S O O OH CH3 CH3 S O O OH CH3 CH3 S O O OH CH3 CH3 S O O OH CH3 CH3 S O O OH CH3 CH3 N N N S NHNH CH3 CH3 CH3 NH2 OH O massbank.eu
  • 7. 7 TERMINOLOGY! o Peak picking can be multi-directional, i.e. • in mass… or time…
  • 8. 8 Mass: Centroid vs Profile Data (enviPat) https://www.envipat.eawag.ch/index.php and Loos et al Anal. Chem. 87(11), 5738-5744. DOI: 10.1021/acs.analchem.5b00941
  • 9. 9 Mass: Centroid vs Profile Data (enviPat) https://www.envipat.eawag.ch/index.php and Loos et al Anal. Chem. 87(11), 5738-5744. DOI: 10.1021/acs.analchem.5b00941
  • 10. 10 TERMINOLOGY! http://proteowizard.sourceforge.net/ o Peak picking can be multi-directional (mass, time) • Peak picking in Proteowizard MSConvert is “centroiding” masses (turning profile mode data into centroided data for efficient processing)
  • 11. 11 Peak Picking (in time) Source: R. Tautenhahn, C. Böttcher, S. Neumann, BMC Bioinformatics 2008, 9:504. DOI: 10.1186/1471-2105-9-504 o Peak picking along time axis (chromatographic peaks)
  • 12. 12 Peak Picking Source: R. Tautenhahn, C. Böttcher, S. Neumann, BMC Bioinformatics 2008, 9:504. DOI: 10.1186/1471-2105-9-504 o Peak picking along time axis (chromatographic peaks)
  • 13. 13 Peak Picking Source: Johannes Rainer; http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html o Peak picking along time axis (chromatographic peaks)
  • 14. 14 Peak Picking Source: Johannes Rainer; http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html o Peak picking along time axis (chromatographic peaks) Several Samples Overlaid Red = KO Blue = wild type Rectangle = chromatographic peaks identified per sample
  • 15. 15 Peak Picking o Several options for peak picking • XCMS and centWave • Tautenhahn et al 2008 DOI: 10.1186/1471-2105-9-504 • http://bioconductor.org/packages/xcms/ • MZmine 2 • Pluskal et al 2010 DOI: 10.1186/1471-2105-11-395 • http://mzmine.github.io/ • enviPick / enviMass • Loos 2018 DOI: 10.5281/zenodo.1213098 • http://www.looscomputing.ch/eng/enviMass/overview.htm • Plenty of other open, research and vendor options ...
  • 16. 16 Peak Picking o Result is something like this (from Formulator output):
  • 17. 17 Peak Picking – XCMS & XCMS Online o http://bioconductor.org/packages/xcms/
  • 18. 18 Peak Picking – XCMS & XCMS Online o https://xcmsonline.scripps.edu/
  • 19. 19 Peak Picking – enviMass and enviPick o http://www.looscomputing.ch/eng/enviMass/overview.htm o R packages …
  • 20. 20 Peak Picking in Pictures http://www.looscomputing.ch/eng/enviMass/topics/peakpicking.htm Red = peaks Grey = noise
  • 21. 21 Peak Picking .. Somewhat simpler picture http://www.looscomputing.ch/eng/enviMass/topics/peakpicking.htm
  • 22. 22 centWave – Gaussian with “Mexican Hat” https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
  • 23. 23 centWave – Gaussian with “Mexican Hat” https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
  • 24. 24 centWave – Gaussian with “Mexican Hat” https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
  • 25. 25 But … peaks are not perfect! http://www.looscomputing.ch/eng/enviMass/topics/peakpicking.htm o See enviMass website for explanation …
  • 26. 26 Critical Point: Separating Peaks from Baseline
  • 27. 27 Peak Picking Parameters https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504 o There are a lot of options to tweak! • I will just run through (main) centWave parameters • enviPick is too complicated => further reading!
  • 28. 28 Peak Picking Parameters: centWave ppm maximal tolerated m/z deviation in consecutive scans, in ppm (parts per million) NOTE: dependent on your mass spectrometer
  • 29. 29 Peak Picking Parameters: centWave peakwidth Chromatographic peak width, given as range (min,max) in seconds NOTE: highly dependent on your chromatography!
  • 30. 30 Peak Picking Parameters: centWave snthresh Signal to noise ratio cutoff
  • 31. 31 Peak Picking Parameters: centWave prefilter prefilter=c(k,I). Prefilter step for the first phase. Mass traces are only retained if they contain at least k peaks with intensity >= I Only one “stick” so will fail recommended prefilter settings
  • 32. 32 Too Many Peak Picking Parameters ??????? https://bioconductor.org/packages/ release/bioc/vignettes/IPO/inst/doc /IPO.html o IPO to the rescue! o Parameter optimization for xcms-based workflows … o Libiseller et al 2015, DOI: 10.1186/s12859-015-0562-8 IPO = Isotopologue Parameter Optimization
  • 33. 33 Too Many Peak Picking Parameters ???????
  • 34. 34 RECAP: Why Peak Pick? Identification = turning numbers into structures N N N S CH3 NHNH CH3 CH3 CH3 N N N S CH3 NHNHCH3 CH3 OH P O S SO CH3 CH3 CH3 P OHS S O CH3 CH3 OH CH3 S O O OH CH3 CH3 S N S O O OH S O O OH CH3 CH3 S O O OH CH3 CH3 S O O OH CH3 CH3 S O O OH CH3 CH3 S O O OH CH3 CH3 N N N S NHNH CH3 CH3 CH3 NH2 OH O massbank.eu
  • 35. 35 o Instruments change over time … o Before we can do fancy statistics, we need to make sure our samples are comparable!
  • 38. 38 Alignment ~= Retention Time Correction http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#3_initial_data_inspection o Many algorithms and methods … o Before:
  • 39. 39 Alignment ~= Retention Time Correction http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#5_alignment o Many algorithms and methods … o After (Obiwarp algorithm in xcms)
  • 41. 41 Changes over samples http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#5_alignment o Difference between adjusted and raw retention times along the retention time axis
  • 42. 42 Some advice … o Peak pickers are designed to pick the perfect peak • But life is never perfect and peaks are no different o Pick the peak picker that is best for your situation • Convenience, ease of use, designed for your data, … • The optimal choice is usually a compromise o Be sceptical (visualise your data, reality check it, etc.) • But don’t go overboard in evaluating peak pickers … remember your (real) goal …
  • 43. 43 Peak Picking Overlap (centWave paper) https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
  • 44. 44 Verify with EIC Extraction [these are NOT picked] https://github.com/schymane/ReSOLUTION/blob/master/R/RMB_EIC_prescreen.R No peak at all Nice peak, MSMS Peak, no MSMS Noise with MSMS (careful!) Isobars with MSMS (careful!)* Looking for chemicals known to be present in the sample
  • 45. 45 Just because you find a peak … ENTACT Project: https://www.epa.gov/sites/production/files/2018-06/documents/comptox_cop_6-28-18.pdf o Mix 505: One candidate with this mass/formula • DTXSID9040001, C9H8O4 o One chemical… How many peaks?
  • 46. 46 …doesn’t mean it’s your compound of interest!
  • 47. 47 Beware of artefacts! o Your results also depend on the acquisition data!
  • 48. 48 Further reading DOING! [Vendor independent] o Don’t just take my word for it … don’t just read about it … DO IT. There are so many ways to try it out … complete with sample data! [Open Science!] o http://bioconductor.org/packages/release/bioc/vignettes/x cms/inst/doc/xcms.html o http://www.looscomputing.ch/eng/enviMass/overview.htm o An interface that many enjoy, likely comes with example data but requires a login … o https://xcmsonline.scripps.edu/
  • 49. 49 Further reading DOING! [Vendor independent] o http://mzmine.github.io/ o http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/ o MS-DIAL
  • 52. 52 Quality Control of Data Slide c/o Michael Stravs o Always visualise results … never take anything for granted
  • 53. 53 Homologues: Challenge Peak Pickers but are Present! Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 www.massbank.eu ACCESSIONS (LAS, SPACs): Literature MS/MS LIT00034, LIT00037 Std Mix., Sample ETS00012, ETS00018https://github.com/MassBank/RMassBank/ Tentatively Identified Spectra: http://goo.gl/0t7jGp
  • 54. 54 Be wary of instrument specific phenomena! o R package nontarget: satellite peak removal
  • 55. 55 Be wary of instrument specific phenomena II o Orbitrap-specific calibration issues (not observed in TOF)