Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Cancer Genomics Visualization across Scales: Nucleotides to Cohorts
1. Cancer Genomics Visualization across
Scales: Nucleotides to Cohorts
Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
http://gehlenborglab.org | nils@hms.harvard.edu | @ngehlenborg
3. Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
http://gehlenborglab.org | nils@hms.harvard.edu | @ngehlenborg
Cancer Genomics Visualization across
Scales: Nucleotides to CohortsCohorts
9. Characteristics
Dozens to thousands of patients
One or more samples per patient: tumor &
normal tissue, primary tumor & metastatic
tumor(s), multiple time points, etc.
10. Characteristics
Dozens to thousands of patients
One or more samples per patient: tumor &
normal tissue, primary tumor & metastatic
tumor(s), multiple time points, etc.
11. Characteristics
Dozens to thousands of patients
One or more samples per patient: tumor &
normal tissue, primary tumor & metastatic
tumor(s), multiple time points, etc.
Many attributes per sample: omics data,
clinical measurements, outcomes, etc.
14. StratomeX
Discovering Subtypes in Tumor Cohorts
Marc Streit, Alexander Lex, Samuel Gratzl, Christian Partl, Dieter Schmalstieg, Hanspeter Pfister,
Peter Park, Nils Gehlenborg
Guided Visual Exploration of Genomic Stratifications in Cancer
Nature Methods, 11, 884–885, 2014
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
25. microRNA expression
DNA methylation
protein expression
copy number variants
mutation calls
clinical parameters
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
36. Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
42. Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
43. Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
PROBLEM 3
Identify relevant stratifications, pathways, and clinical variables.
47. Is there a mutation that overlaps with this mRNA cluster?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
48. Is there a mutation that overlaps with this mRNA cluster?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
49. Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
50. Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
56. StratomeX+
Interactive visual exploration and refinement of
cluster assignments
Michael Kern, Alexander Lex, Nils Gehlenborg, Christopher R Johnson
Interactive visual exploration and refinement of cluster assignments
BMC Bioinformatics 18:406 (2017)
Alexander Lex
University of Utah
58. Cluster Refinement
Adjust cluster (i.e. subtype) membership based on within- and between-cluster
metrics in context of other data
M Kern, A Lex, N Gehlenborg, C Johnson, BMC Bioinformatics (2017)
59. Cluster Refinement
Adjust cluster (i.e. subtype) membership based on within- and between-cluster
metrics in context of other data
M Kern, A Lex, N Gehlenborg, C Johnson, BMC Bioinformatics (2017)
67. Vistories
From Visual Exploration to
Storytelling and Back Again
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
68. Vistories
From Visual Exploration to
Storytelling and Back Again
Samuel Gratzl, Alexander Lex, Nils Gehlenborg, Nicola Cosgrove, Marc Streit
From Visual Exploration to Storytelling and Back Again
Computer Graphics Forum (EuroVis ’16) 35:491 (2016)
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
69. Reproducible Visual Exploration
finding figure/videoAuthoringExploration Presentation
Current Model
S Gratzl, A Lex, N Gehlenborg, N Cosgrove, M Streit, Computer Graphics Forum (2016), http://vistories.org
70. Reproducible Visual Exploration
finding figure/videoAuthoringExploration Presentation
Current Model
Visualization Tool e.g. Illustrator e.g. PDF Viewer
S Gratzl, A Lex, N Gehlenborg, N Cosgrove, M Streit, Computer Graphics Forum (2016), http://vistories.org
73. Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
Domino
Extracting, Comparing, and Manipulating Subsets
across Tabular Datasets
79. Motivation
1. StratomeX is limited to a rigid columnar layout
2. StratomeX only shows connections on a block level, not for individual samples
80. Motivation
1. StratomeX is limited to a rigid columnar layout
2. StratomeX only shows connections on a block level, not for individual samples
3. StratomeX only supports exploration along the sample/patient dimension
90. OncoThreads
Incorporating Longitudinal Information
Theresa Harbig, Sabrina Nusrat, Alex Thomson, Hans Bitter, Tali Mazor, Ethan Cerami, Nils Gehlenborg
Visualization of Longitudinal Cancer Genomics Data
Work in Progress
Sabrina Nusrat
Harvard
Theresa Harbig
Harvard
Ethan Cerami
DFCI
Tali Mazor
DFCI
93. Motivation
1. Cohorts of patients with longitudinal sample information
2. Events between sample collection critical for interpretation
94. Motivation
1. Cohorts of patients with longitudinal sample information
2. Events between sample collection critical for interpretation
3. Application to longitudinal cancer cohorts or clinical trials
95. State of the Art
Miller, Christopher A., et al. "Visualizing tumor evolution with the fishplot package for R." BMC genomics 17.1 (2016): 880.
96. State of the Art
Need for visualizations of entire patient cohorts instead of single patient to explore temporal patterns
Miller, Christopher A., et al. "Visualizing tumor evolution with the fishplot package for R." BMC genomics 17.1 (2016): 880.
98. Requirements
● Develop a tool for the visualization of temporal
cancer genomic data in patient cohorts
http://www.cbioportal.org/
99. Requirements
● Develop a tool for the visualization of temporal
cancer genomic data in patient cohorts
● Integrate multiple different datatypes
http://www.cbioportal.org/
100. Requirements
● Develop a tool for the visualization of temporal
cancer genomic data in patient cohorts
● Integrate multiple different datatypes
● Web-based and compatible with the cBio Portal
http://www.cbioportal.org/
101.
102. Design Sprint
Knapp, Zeratsky, and Kowitz. Sprint: How to solve big problems and test new ideas in just five days (2016)
106. Visualizing a patient over time
Grade II
39
Grade IV
1226
Time
Neoplasm Histologic Grade
Mutation Count
Neoplasm Histologic Grade
Mutation Count
Two samples at different timepoints
represented by two variables
107. Visualizing a patient over time
What could explain the change?
Did the patient receive a treatment?
Grade II
39
Grade IV
1226
Time
Neoplasm Histologic Grade
Mutation Count
Neoplasm Histologic Grade
Mutation Count
Two samples at different timepoints
represented by two variables
108. Visualizing a patient over time
Grade II
39
Grade IV
1229
Time
Neoplasm Histologic Grade
Mutation Count
Neoplasm Histologic Grade
Mutation Count
Treatment
Treatment
Treatment
No TMZ
TMZ
No TMZ
109. Visualizing a patient over time
Is this a common pattern?
Grade II
39
Grade IV
1229
Time
Neoplasm Histologic Grade
Mutation Count
Neoplasm Histologic Grade
Mutation Count
Treatment
Treatment
Treatment
No TMZ
TMZ
No TMZ
127. Summary
- Temporal Cancer genomic data can be visualized using temporal heatmaps and
Sankey diagrams
http://oncothreads.gehlenborglab.org
128. Summary
- Temporal Cancer genomic data can be visualized using temporal heatmaps and
Sankey diagrams
- Domino inspired some of our design choices
http://oncothreads.gehlenborglab.org
129. Summary
- Temporal Cancer genomic data can be visualized using temporal heatmaps and
Sankey diagrams
- Domino inspired some of our design choices
- Design Sprint Technique helped us to develop a new concept within only five days
http://oncothreads.gehlenborglab.org
132. Take Aways
Despite highly heterogeneous data, the “block and ribbon” approaches are able to
integrate a wide range of data types
133. Take Aways
Despite highly heterogeneous data, the “block and ribbon” approaches are able to
integrate a wide range of data types
Integration of auxiliary visualization types (pathways, Kaplan-Meier plots, box plots,
etc.) extend the possibilities
134. Take Aways
Despite highly heterogeneous data, the “block and ribbon” approaches are able to
integrate a wide range of data types
Integration of auxiliary visualization types (pathways, Kaplan-Meier plots, box plots,
etc.) extend the possibilities
Ability to aggregate data is critical to these approaches
137. Next Steps
Provide support for guided exploration in OncoThreads
Integration with other data management systems (e.g. i2b2 TranSMART, in addition to
cBioPortal)
138. Next Steps
Provide support for guided exploration in OncoThreads
Integration with other data management systems (e.g. i2b2 TranSMART, in addition to
cBioPortal)
- Challenge: generally not designed to support visualization, e.g. aggregation
139. Next Steps
Provide support for guided exploration in OncoThreads
Integration with other data management systems (e.g. i2b2 TranSMART, in addition to
cBioPortal)
- Challenge: generally not designed to support visualization, e.g. aggregation
- Opportunity: easier to deploy visualizations in real-world settings
140. Next Steps
Provide support for guided exploration in OncoThreads
Integration with other data management systems (e.g. i2b2 TranSMART, in addition to
cBioPortal)
- Challenge: generally not designed to support visualization, e.g. aggregation
- Opportunity: easier to deploy visualizations in real-world settings
Integration with analytical backends (e.g. Jupyter Notebooks or pipelines)
141. Next Steps
Provide support for guided exploration in OncoThreads
Integration with other data management systems (e.g. i2b2 TranSMART, in addition to
cBioPortal)
- Challenge: generally not designed to support visualization, e.g. aggregation
- Opportunity: easier to deploy visualizations in real-world settings
Integration with analytical backends (e.g. Jupyter Notebooks or pipelines)
Better integration of specialized visualizations with support for faceting and
aggregation
142. Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
http://gehlenborglab.org | nils@hms.harvard.edu | @ngehlenborg
Cancer Genomics Visualization across
Scales: Nucleotides to CohortsCohorts