Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)
http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/
In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.
http://stratomex.caleydo.org
Topic 9- General Principles of International Law.pptx
Visual Exploration of Clinical and Genomic Data for Patient Stratification
1. Visual Exploration of
Clinical and Genomic Data for
Patient Stratification
NILS GEHLENBORG
!
@nils_gehlenborg・http://www.gehlenborg.com
Broad Institute of MIT and Harvard
Cancer Program
Harvard Medical School
Center for Biomedical Informatics
2. Team
Alexander Lex Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA
Marc Streit Johannes Kepler University, Linz, Austria
Christian Partl Graz University of Technology, Graz, Austria
Sam Gratzl Johannes Kepler University, Linz, Austria
Dieter Schmalstieg Graz University of Technology, Graz, Austria
!
Hanspeter Pfister Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA
Peter J Park Harvard Medical School, Boston, MA, USA
!
Nils Gehlenborg Harvard Medical School, Boston, MA, USA & Broad Institute, Cambridge, MA
!!
!
!
Special thanks to
Broad Institute TCGA Genome Data Analysis Center Team
in particular Michael S Noble, Lynda Chin & Gaddy Getz
3. Funding
Peter J Park NIH/NCI The Cancer Genome Atlas
!
Nils Gehlenborg NIH/NHGRI K99/R00 Pathway to Independence Award
!!
20. Correlation with clusters based on other data types?
Different outcomes?
Mutations or copy number variants associated with clusters?
Demographic differences?
21. Challenges
How can we explore overlap of patient sets across stratifications?
How can we compare properties of patient sets within a stratification?
How can we discover “interesting” stratifications and pathways to consider
How can we handle terabytes of clinical and genomic data in visualization tools?
22. Problem 1
!
Comparing Patient Sets
across Stratifications
56. Problem 3
!
Finding “Interesting”
Stratifications and Pathways
57.
58. Is there a mutation that overlaps with this mRNA cluster?
Is there a mutually exclusive mutation?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Query
Stratifications
Clinical Params
Pathways
61. LineUp
S Gratzl, A Lex, N Gehlenborg, H Pfister and M Streit, “LineUp: Visual Analysis of Multi-Attribute
Rankings“, IEEE Transactions on Visualization and Computer Graphics 19:2277-2286 (2013)
62. Example: Clear Cell Renal Carcinoma (KIRC)
Main TCGA Paper published in Nature in 2013
!
First goal here: Characterize mRNA clusters
69. Queries
Retrieve Stratifications
Sets with large overlap: Jaccard Index
Similar stratifications: Adjusted Rand Index
Survival: Log Rank Score (one vs rest)
Retrieve Pathways
Gene Set Enrichtment Score: original or PAGE (one vs rest)
106. Problem 4
!
Dealing with Terabytes of
Cancer Genomics Data
107. TCGA
Data Coordination Center
Broad Institute
Genome Data Analysis Center
Standardized Data Sets
Standardized Analyses
Analysis Reports
MSKCC cBio Portal
TCGA Working Groups
StratomeX
...
108. Standardized Data Sets Standardized Analyses Analysis Reports
Data set versioning
Format normalization
Removal of redacted data
. . .
Mutation Analysis
Copy Number Analysis
Clustering
Correlations
Pathway Analysis
. . .
109. 102
Standardized Data Sets Standardized Analyses Analysis Reports
http://gdac.broadinstitute.org
individual downloads and view reports
firehose_get
bulk download
110. 102
Standardized Data Sets Standardized Analyses Analysis Reports
http://gdac.broadinstitute.org
individual downloads and view reports
firehose_get
bulk download
111. Standardized Data Sets Standardized Analyses Analysis Reports
+ = one per
Data Matrices Stratifications
mRNA (array & sequencing)
microRNA (array & sequencing)
methylation
reverse phase protein array
clinical parameters
clustering (CNMF & hierarchical)
gene mutation status (binary)
gene copy number status (5 class)
Data Package
tumor type
112.
113.
114. up to 24 data and result files
from 18 Firehose archives
up to 500 MB (190 MB compressed)
Data Packages
116. Challenges
How can we explore overlap of patient sets across stratifications?
How can we compare properties of patient sets within a stratification?
How can we discover “interesting” stratifications and pathways to consider
How can we handle terabytes of clinical and genomic data in visualization tools?
117. CALEYDO
StratomeX is part of the Caleydo Visualization Framework
Implemented in Java, uses OpenGL and
Eclipse Rich Client Platform
Binaries available for Linux, Windows, Mac OS X
Requires Java 1.7 JRE or JDK (on Mac OS X)
Open source licensed under BSD license
Source code on GitHub
118. CALEYDO
StratomeX
http://stratomex.caleydo.org
http://www.github.com/caleydo
A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N
Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous
Genomics Data for Cancer Subtype Characterization”, Computer Graphics
Forum (EuroVis '12), 31:1175-1184 (2012)
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N
Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in
Cancer”, Nature Methods 11:884–885 (2014)
120. Domino
S Gratzl, N Gehlenborg, A Lex, H Pfister and M Streit, “Domino: Extracting, Comparing, and
Manipulating Subsets across Multiple Tabular Datasets“, IEEE Transactions on Visualization and Computer
Graphics (2014)
129. Refinery Platform
!
! |
Data repository based on ISA-Tab for reproducible research
Workflow execution in Galaxy
Integrated visualization tools with access to provenance
http://www.refinery-platform.org
130. CALEYDO
StratomeX
http://stratomex.caleydo.org
http://www.github.com/caleydo
A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N
Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous
Genomics Data for Cancer Subtype Characterization”, Computer Graphics
Forum (EuroVis '12), 31:1175-1184 (2012)
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N
Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in
Cancer”, Nature Methods 11:884–885 (2014)
131. Execute Logrank
Test query
Select displayed
set
Execute Jaccard
Index query
Select displayed
Z[YH[PÄJH[PVU
Execute Adjusted
Rand Index query
6WLU8LY`PaHYK 6WLU8LY`PaHYK
Select pathway
Select displayed
set
Add other data
Execute GSEA
query
Select displayed
Z[YH[PÄJH[PVU
Select displayed
Z[YH[PÄJH[PVU
Select clinical param.
in LineUp view
Manually
Execute Logrank
Test query
Execute PAGE
query
:LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU
in LineUp view
Select pathway Select pathway Select pathway Select clinical param.
in LineUp view
(KKZ[YH[PÄJH[PVU
Based on Logrank
Test score (survival)
Based on similarity to
KPZWSH`LKZ[YH[PÄJH[PVU
Based on overlap
with displayed set
Add pathway
Stratify with displayed
Z[YH[PÄJH[PVU
Find based on differential
expression in displayed set
Stratify with displayed
Z[YH[PÄJH[PVU
Display
UZ[YH[PÄLK
Add pathway
Based on Logrank
Test score (survival)
Add other data
Add independent
column
Add dependent
column
Add independent
column to existing one
Manually
Based on GSEA Based on PAGE
6WLU8LY`PaHYK
Select clinical param.
in LineUp view
in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view