Molecular and data visualization in drug discovery
1. Molecular and Data Visualization
in Drug Discovery
Deepak Bandyopadhyay
GlaxoSmithKline
2. Intro: Human Body & Disease Biology
• From Wikipedia:
– Abnormal condition that affects part or all of an organism.
– Associated with specific symptoms and signs.
• Causes:
– Single cause, e.g. pathogen, poison, nutrient deficiency, genetics
– Multiple factors including environment, lifestyle, genetics
http://www.biologyguide.net/biol1/1_disease.htm
Mycobacterium
tuberculosis
Chest X-ray showing
lung cancer
3. Drug Discovery Parts/Timeline
Focus of Drug Discovery
• Narrow down on one or a
few substances to test in
humans and develop into a
drug that treats a disease
Components:
Target Selection
and Validation
genome
protein
link to
disease
disease
genetics
pathology
biological
target
In Vitro BiologyMedicinal Chemistry
(Lead Optimization)
Lead Discovery
(a.k.a. Screening)
In Vivo Biology
4. Molecular and Data Visualization
• The two parts of my job at GSK!
• Molecules:
– small (drugs/peptides) and large
(proteins/DNA/RNA/lipids)
– visualized in 1D (SMILES), 2D (structure), 3D
(coords / conformations), 4D (Mol. Dynamics)
• Data:
– Format: numeric / text,
continuous / categorical,
Delimited/database/XML/proprietary
– Source: instruments, manual entry, calculation
– About drug discovery projects (key: molecule ID),
genomics/proteomics (key: gene/protein ID),
clinical studies (key: anon. patient ID), …
Ibuprofen
DRUG
PROTEIN
EGFR
Ball
and
stick
EGFR
ribbons
5. Movie: Introduction to Drug Design
By Schrödinger (molecular modeling software company): https://www.youtube.com/watch?v=u49k72rUdyc
7. Molecular Visualization Deconstructed
• Representations
• Navigation
• Interaction
• What would you add?
Aspirin (ligand)
Cox-1 (protein)
Binding
pocket
surface
polar
+ve charge
hydrophobic
-ve charge
XY translate, Z zoom
Rotate about X/Y or Z
E.g. in program MOE
F1
F2
F3
Save/restore
scenes
Select Hide/Show Center Prev/Next Scene
Expand Sel. Import/Export Align Compute…
8. Purposes of Molecule Visualization
• Understand and rationalize “SAR” in 3D
• (Protein) Structure-Based Drug Design. E.g.:
– Aspirin Binds COX1/2, Celebrex binds COX2 only
• Clearly illustrate biological systems / processes
• What other tasks can you think of?
9. Case study 1: Protein-Protein Interactions
HIV-1 coat protein gp120 bound to antibody 17b (Light, Heavy) and CD4
gp120/CD4 interfacegp120/antibody L/H interface
Rank color: > > > > > >
Ban, Y. E. A., Edelsbrunner, H., & Rudolph, J. (2006). Interface surfaces for protein-protein complexes. J. ACM, 53(3), 361-378.
10. Case-Study 2: Molecular Dynamics Simulation
of a drug entering into the binding site of a target protein
Decherchi et al., Nature Comms. 6(6155), 2015. https://www.youtube.com/watch?v=ckTqh50r_2w
11. From Molecules to Data
Mol spreadsheets, visualizations
StarDrop Glowing Molecules™ image from
http://www.asteris-app.com/technical-info.htm
Hybrid molecule/data visualization
12. Software Systems: Spotfire
• Feature set / distinguishing factors:
– Handling large datasets via filtering and
memory management
– Tabular file (CSV, Excel) or database input
– Multiple, configurable visualization types
– Easy enough for domain experts to use / share
– Life science add-ons
• Molecule depiction
• Specialized –omics packages
Binned pIC50 trellised by HBA and HBDpIC50 vs. % inh
13. Software Systems: LiveDesign
• Consolidate multiple disconnected tools for molecule design
– Integrated Single Platform
– Intuitive UI
– 2D, 3D, Data & Visuals
– Social aspect
14. Dimensions, dimensions…
• Molecules: 1D (SMILES e.g. c1ccccc1),
2D (depiction), 3D (coords), 4D (motion)
• Data:
– 100s of activities, measured and predicted properties
per row (compound)
– ~100K for gene expression, clinical trial data
– Millions for –omics, next-gen sequencing
– Then there’s systems biology…
• Dimensionality reduction is a key capability
– PCA, SOM, Stochastic Proximity Embedding,…
15. Challenges / Types of Visualization
• Key capabilities for data visualization
– Large data human comprehension
– High-level summary + drill-down
– Quickly (auto?) isolate interesting data points
http://guides.library.duke.edu/datavis/vis_types
map
SOM
Parallel coords
Heat mapprotein
Volume
rendering
http://flagshipbio.com/amino-acid-structure-properties-using-self-organizing-maps/
Radar
plot
Box Plot
Sunburst
2D 3D nD hierarchical
Dendro-
gram
Network/Graph
layout
Wikipedia
16. All the Data at Once: Vlaaivis
T. J. Howe, G. Mahieu, P. Marichal,T. Tabruyn and P. Vugts. Data reduction and representation in drug discovery. Drug Discovery Today 12(1/2):45-53 Jan 2007 R
17. All the Data at Once (cont’d): Radar Plots
• Circular histogram for viewing multi-parameter results
The influence of the 'organizationalfactor'on compoundquality in drug discovery
Paul D. Leeson & Stephen A. St-Gallay
Nature Reviews Drug Discovery 10, 749-765 (October 2011)
Property differences are scaled to either +1, whereby the company with a positive ('best') property value had the
highest magnitude, or −1, whereby the company with the lowest ('worst') value had the highest magnitude.
18. Visualizing Large Datasets
P. Ertl & B. Rohde, J. Cheminformatics 4(12), 2012
Gaspar et al. J. Chem. Inf. Model., 2015, 55 (1), pp 84–94
Network-like
similarity graph
Bajorath et al.
• Dimensionality reduction
• Graph layout
• Activity landscape
• Probabilistic property plots
• Scaffold abstraction
Steven Muchmore,
Abbott Labs
(now Abbvie)
Molecule cloud
MolecularProperty 1
MolecularProperty2
Probabilityofsuccess
(crossingcellmembrane)
19. SAR Tables
• SAR: Structure-Activity Relationship
– Split molecule: core/scaffold, pendant R-groups
– SAR Table: molecule spreadsheet with
R-groups and Activity Data
(-OH)
(-COOH)
20. SAR Maps - R1 vs. R2 on a Core
Selectiveforprotein1pIC502‒pIC501Selectiveforprotein2
R1
R2
Core “scaffold”:
D. K. Agrafiotis et al. SAR Maps: A New SAR Visualization Technique for Medicinal Chemists. J. Med. Chem., 2007, 50 (24), 5926–5937.
21. Clustering
• Based on chemical descriptors, biological activity, etc…
• Agglomerative or hierarchical
Hoek, Keith S. et al.: Metastatic potential of melanomas defined by specific gene
expression profiles with no BRAF signature. Pigment Cell Research 19 (4), 290-302
http://chemmine.ucr.edu/help/
Molecules Genes
22. Limitations of Clustering
Molecule single cluster, can be limiting
seals
(fur)
?
singleton
?
ducks
(bill)
?
penguins (flipper)
?
Cluster 3 Cluster 10
similar molecules ≠ same cluster
Many singletons
Complete Link Cluster ID
ClusterSize
23. Automatic Decomposition into
(All) Overlapping Scaffolds
Malarial parasite
assay pIC50 8.1
…
49 total
…
226 total
2 total
Molecule
Scaffold(s)
Related Molecules
24. 8.2
Avg pIC50
8.15
Avg pIC50
7.8
Avg pIC50
7.8
Next Step: Combine with
Activities and Properties
…
49 total
…
226 total
2 total
8.5
8.2
8.0
7.5
7.7
8.5
7.4
7.9
7.7 8.2
Molecule
Scaffold(s)
Annotation
Related Molecules
25. Case Study: Linking Molecules By Scaffolds
• Use aggregate properties for decision making
• Find related molecules with improved properties
Improving property 1
Improvingactivity2
Aggregate
(scaffold)
↓
Drill down
(8 molecules)
Improving activity 3
Improvingproperty4
>
Keep top half of molecule,
substitute bottom half
Example 1 Example 2
26. Summary and Lessons Learned
• Drug discovery has specialized types of data that are
best understood by visualization
• Good visualizations can support the making of good
decisions (and the converse: GIGO…)
• The human element is important – visuals and
analytics should be creatable/usable by scientists
• As new visual analytics experts, consider careers in
an industry where you can add value and be creative
– Subtle plug for drug discovery
27. Future Directions and Challenges in
Data Visualization for Drug Discovery
• Human vs. Machine or Human + Machine ?
• Automate tediousness of data prep/integration
• Intuitiveness by design
• Interconnection by design
• Integration of latest visualization techniques
developed for other domains
• Using emerging media eg. VR, Kinect
• What can you think of?