SlideShare une entreprise Scribd logo
1  sur  21
Interactive Visual Data Analytics
Wolfgang G. Hoeck, Ph.D.
Senior Manager, Therapeutic Area Systems
Amgen Inc.
Laboratory Data Management, Munich, June16-17, 2009
Agenda

 A bit about Amgen
 Interactive visual data analytics explained
 Screening and target identification/validation
 Expectations from an interactive visual data analytics
  platform
 Data formats ARE important
 Registration systems: uniquely identifying what you
  are working with
 Bringing data together, the art of data mapping
 From tabular data to data networks
Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   2
Amgen: A Biotechnology Pioneer
 Founded in 1980, Amgen was
  one of the first biotechnology
  companies to successfully
  discover, develop and make
  protein-based medicines

 Today, we’re leading the
  industry in its next wave of
  innovation by:
   – Developing therapies in
     multiple modalities
   – Driving cutting-edge
     research and development
   – Continuing to advance the
     science of biotechnological
     manufacturing

                                   3
Our Worldwide Presence

                 Cambridge, MA                  Norway                            Denmark

                      Toronto, ON          Luxembourg                                 Finland

             West Greenwich, RI            The Netherlands                                  Sweden

                Washington, DC                 Belgium
                                                                                            Estonia
     Burnaby, BC                                 Ireland
                                                                                            Latvia
                                                                                            Lithuania
     Bothell, WA                                                                            Russia
     Seattle, WA                               England
   Longmont, CO                                                                             Czech Republic
                                               France
    Boulder, CO                                                                      Poland
                                           Switzerland
                                                                                     Slovakia
                                                                                     Hungary
  Fremont, CA                                                                                     India
                                                                         United Arab Emirates
South San Francisco, CA                                                                                               Hong Kong
                                                                                Greece
                                                                                Slovenia
       Thousand Oaks, CA                                                        Austria
                                                                                Germany
Mexico City, Mexico
                                                                          Italy
     Louisville, KY                                                     Spain                           Australia
      Juncos, Puerto Rico                                            Portugal                               New Zealand




 Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009                  4
Scientific Data are complex, and it’s not
going to get any better
 Target Identification & Validation
      – Gene Expression of Cell Line Panels: 200 x 45000 x 3
             • Understand differential expression of one or a handful of genes
             • Understand expression profile in a particular cell line only
      – Gene Expression of tumor samples: The Cancer Genome
        Atlas
             • Pilot phase: 3 tumor types - 500 GBM/Ovarian Cancer & 200 Lung Cancer
               samples
             • Next years: 25 more tumor types

 Compound/Target Profiling
      – 400+ targets across 100’s of small molecule compounds
             • Compare target properties with compound properties

 Cell Line Profiling
      – 500 cell lines treated with 50 therapeutic molecules
      – Each cell line has genetic abnormalities in many genes
        (mutations, deletions, insertions, rearrangements, etc.)
Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   5
Visualization of complex data must be
made available in interactive format




Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   6
Interactive browsing of pre-analyzed data
– finding cell lines for in-vitro work

 Step 1: Select gene of interest, e.g.: EGFR
 Step 2: Select study of interest
 Step 3: Review relative expression pattern
 Step 4: Select cell line(s) for further work
Steps to share data in an interactive visual
format

 Determine the location of desired data (one or
  multiple places and/or formats)
 Run a query against a database/data
  warehouse




                                                                                                   Power
                                                                                                    User
 Capture a dataset(table) of rows and columns
 Decide on needed analytics & visualizations
 Determine visualization settings and state
 Share the results with other scientists




                                                                                                   Decision
                                                                                                    Maker
 Enable scientists to interact with data
 Enable scientists to download sets of data
Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   8
We have many choices to visualize data …
 Table
 Bar Chart
 Box Plot
 Scatter Plot (X/Y-Plot)
 Line Chart
 Heatmap
 Parallel Coordinates Plot
  (Profile Chart)
 Network
 Map
 TreeMap
 e-Northern
Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   9
…and all choices should retain
interactivity




                                                                           Filtering a set of cell line and gene alteration
                                                                           data to view a particular set of cells and the
                                                                           set of genes harboring deletions




Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009              10
The ideal interactive, visual data analytics
platform
                                                                     Desktop Clients             Zero-footprint Web Clients

 A desktop client
      – Rich interactivity, visuals
      – Rich analytics tools

 A server component
      – Configurable security
      – Configurable data access
                                                                               Analysis                     Web
                                                                                Server                     Server
 A web client
      – Rich interactivity                        Stats, etc.
                                                   Server
      – Easy access

 An API for extension
  capabilities                                                               DB1               DB2                  DB3




Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009          11
From Desktop Client to Analysis Server
   Data Access                                                        Data Analysis
      –    From files                                                     –    Clustering Methods
      –    From databases                                                        • Hierarchical
      –    From clipboard                                                        • K-Means
      –    From services                                                         • PCA
                                                                                 • SOM
   Data Manipulations                                                    –    Profile Searching
      –    Data Mapping
      –    Data Merging                                                Documentation
      –    Calculations                                                   –    Space to explain what was done
      –    Data Transformations
                                                                       Data Content
   Visualizations                                                        –    Tabular Format
      –    Table                                                          –    Multiple Tables
      –    X/Y-Plot                                                       –    Relationships between Tables
      –    Bar Chart
                                                                       Data Security
      –    Parallel Coordinate Plot
                                                                          –    Group Level Security
      –    Box-Plot
                                                                          –    Function Level Security
      –    Networks
                                                                          –    Integration with Corporate LDAP
   Data Storage
                                                                       Action Logging
      –    One or many tabular datasets
                                                                          –    Who, When, What
Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009        12
About Data Formats
 Tall-Skinny
      – aka non-pivoted data format
      – Each row represents a single event

 Short-Wide
      – aka pivoted data format
      – Each rows represents a summary of
        events in particular circumstances
      – Typically results in “data loss”

 Subject-Verb-Object
      – aka network data format
      – aka nodes and edges
      – Represent complex data relationships,
        i.e.: everything has a potential many-to-
        many relationship
Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   13
We are dealing with a Complex Data Concept
Network – register your entities!
                                                                    is critical in                    Disease
 Project                               Target
                       has a
                                                                                                      is critical in

                                                                                     BioProcess
                            is represented by
                                                                                         works in

            Gene                                                          Pathway                            occurs in
                               is translated into
                                                       is functional in

   has a                                                                              has a
                                                       Protein
                                                                                       Protein
     Gene                            is expressed in                                   Status
     Status                                                                          Diff.Expressed
                                                                                     Postt.Modified
     Wildtype
     Mutated                          Cell Line                     is derived from
     Diff.Expressed
     Amplified/Deleted
                                                                                                      Tissue

VIBEvents, Laboratory Data Management Conference, Munich, June 16/17th, 2009                           14
Data Assembly and Integration
    Contract
                                       Human Gene                                                       Amgen
Screening Results                                                         KinomeTree
                                       Nomenclature                                                Project/Compound
   on monthly                                                             Kinase Map
                                        Database                                                      Association
  spreadsheets
•       POC/Kd values              •    Gene Symbol                  •      Kinase                 •   Compound
•       Entrez Gene                •    Full Name                           Classification             registered for
        Symbol                     •    Gene Synonyms                •      Manual mapping             specific Amgen
•       Compound                                                            of Gene Symbols            Project
        Concentration                                                       to Kinome
•       Compound ID                                                         classes
                               Contract Screening
                                 Data Assembly
                                                                         Data get assembled in Spotfire based on
                                in Desktop Client                            matching data keys such as Gene Symbol
                                                                             or CompoundID. Visualizations are prepared
                 Publication Step                                            based on scientist’s input. Filters are
                                                                             organized according to frequency of usage.
                                                                             Adjustments can typically be made in a
                                                                             couple of hours. The final file is published
                 Contract Screening                                          into a web-library accessible via hyperlink.
                 Data Assembly in                                            Announcements are made via e-mail and
                    Web Client                                               embedded hyperlink.

    Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009         15
Viewing and interacting with integrated
data




Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009
Biology Visualizations – Pathway Example




Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   17
Network Visualizations – The hairball
principle




Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   18
Network Visualizations – The hairball
principle resolved
   Tools to connect Nodes
                                                                                                    Kidney




    Tools to extend Nodes
                                                                                                    Bladder




Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   19
Combining tabular & network
visualizations

                                                                                               Step 1:
                                                                                               Select Disease, then select
                                                                                               Therapeutic Molecule




    Step 2:
    Study Therapeutic Molecule
    network


Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009               20
Concluding Thoughts

 Interactive visualizations are a key to making complex
  data shareable and understandable
 If interactivity is self-explanatory, adoption is very
  rapid – nobody wants to read a manual
 Analytics can be accomplished in the hands of the
  power user, it does not need to be available for
  everyone
 Data complexity is not getting any simpler, however,
  with more sophisticated tools even complex data can
  be made accessible and understandable

       Thank you for your time and interest
Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009   21

Contenu connexe

En vedette

2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI ConferenceMegan Sawchuk
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
Tableau reseller partner in Australia Bilytica Best business Intelligence com...
Tableau reseller partner in Australia Bilytica Best business Intelligence com...Tableau reseller partner in Australia Bilytica Best business Intelligence com...
Tableau reseller partner in Australia Bilytica Best business Intelligence com...Carie John
 
Whitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAM
Whitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAMWhitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAM
Whitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAMmetagraphos
 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
 
Checking in on Healthcare Data Analytics
Checking in on Healthcare Data AnalyticsChecking in on Healthcare Data Analytics
Checking in on Healthcare Data AnalyticsCybera Inc.
 
Exploring the Role of Information Technology Systems in Preventing and Managi...
Exploring the Role of Information Technology Systems in Preventing and Managi...Exploring the Role of Information Technology Systems in Preventing and Managi...
Exploring the Role of Information Technology Systems in Preventing and Managi...Health Informatics New Zealand
 
INCREASING LABORATORY EFFICIENCY AND VALUE OF LABORATORY DATA BY MAXIMISING ...
INCREASING LABORATORY EFFICIENCY AND VALUE  OF LABORATORY DATA BY MAXIMISING ...INCREASING LABORATORY EFFICIENCY AND VALUE  OF LABORATORY DATA BY MAXIMISING ...
INCREASING LABORATORY EFFICIENCY AND VALUE OF LABORATORY DATA BY MAXIMISING ...Keynetix
 
Process Improvement - 10 Essential Ingredients
Process Improvement - 10 Essential IngredientsProcess Improvement - 10 Essential Ingredients
Process Improvement - 10 Essential IngredientsRichard Ouellette
 
Advanced Laboratory Analytics — A Disruptive Solution for Health Systems
Advanced Laboratory Analytics — A Disruptive Solution for Health SystemsAdvanced Laboratory Analytics — A Disruptive Solution for Health Systems
Advanced Laboratory Analytics — A Disruptive Solution for Health SystemsViewics
 
The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...
The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...
The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...IDBS
 
eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records Health Informatics New Zealand
 
Electronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data InitiativeElectronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data InitiativeData Science Thailand
 
Basics of laboratory internal quality control, Ola Elgaddar, 2012
Basics of laboratory internal quality control, Ola Elgaddar, 2012Basics of laboratory internal quality control, Ola Elgaddar, 2012
Basics of laboratory internal quality control, Ola Elgaddar, 2012Ola Elgaddar
 
Quality control in the medical laboratory
Quality control in the medical laboratoryQuality control in the medical laboratory
Quality control in the medical laboratoryAdnan Jaran
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 

En vedette (18)

2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
Tableau reseller partner in Australia Bilytica Best business Intelligence com...
Tableau reseller partner in Australia Bilytica Best business Intelligence com...Tableau reseller partner in Australia Bilytica Best business Intelligence com...
Tableau reseller partner in Australia Bilytica Best business Intelligence com...
 
Whitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAM
Whitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAMWhitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAM
Whitepaper2012 "Virtual Laboratory for Analytic Geometry" UNAM
 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
Checking in on Healthcare Data Analytics
Checking in on Healthcare Data AnalyticsChecking in on Healthcare Data Analytics
Checking in on Healthcare Data Analytics
 
Exploring the Role of Information Technology Systems in Preventing and Managi...
Exploring the Role of Information Technology Systems in Preventing and Managi...Exploring the Role of Information Technology Systems in Preventing and Managi...
Exploring the Role of Information Technology Systems in Preventing and Managi...
 
INCREASING LABORATORY EFFICIENCY AND VALUE OF LABORATORY DATA BY MAXIMISING ...
INCREASING LABORATORY EFFICIENCY AND VALUE  OF LABORATORY DATA BY MAXIMISING ...INCREASING LABORATORY EFFICIENCY AND VALUE  OF LABORATORY DATA BY MAXIMISING ...
INCREASING LABORATORY EFFICIENCY AND VALUE OF LABORATORY DATA BY MAXIMISING ...
 
Process Improvement - 10 Essential Ingredients
Process Improvement - 10 Essential IngredientsProcess Improvement - 10 Essential Ingredients
Process Improvement - 10 Essential Ingredients
 
Advanced Laboratory Analytics — A Disruptive Solution for Health Systems
Advanced Laboratory Analytics — A Disruptive Solution for Health SystemsAdvanced Laboratory Analytics — A Disruptive Solution for Health Systems
Advanced Laboratory Analytics — A Disruptive Solution for Health Systems
 
The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...
The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...
The Evolution of Laboratory Data Systems: Replacing Paper, Streamlining Proce...
 
Clinical data analytics
Clinical data analyticsClinical data analytics
Clinical data analytics
 
eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records
 
Electronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data InitiativeElectronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data Initiative
 
Basics of laboratory internal quality control, Ola Elgaddar, 2012
Basics of laboratory internal quality control, Ola Elgaddar, 2012Basics of laboratory internal quality control, Ola Elgaddar, 2012
Basics of laboratory internal quality control, Ola Elgaddar, 2012
 
Quality control in the medical laboratory
Quality control in the medical laboratoryQuality control in the medical laboratory
Quality control in the medical laboratory
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 

Similaire à Dmla0609 Hoeck Presentation

Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web ServicesJose Enrique Ruiz
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
 
Sowmya Raghavan Strand Life
Sowmya Raghavan Strand LifeSowmya Raghavan Strand Life
Sowmya Raghavan Strand LifeEmTech
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk SlidesBioCatalogue
 
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionTutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionJean-Paul Calbimonte
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowEric Stephan
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Measuring electronic resource availability final version
Measuring electronic resource availability final versionMeasuring electronic resource availability final version
Measuring electronic resource availability final versionSanjeet Mann
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Next-Generation Medical Analysis | AWS Public Sector Summit 2017
Next-Generation Medical Analysis | AWS Public Sector Summit 2017Next-Generation Medical Analysis | AWS Public Sector Summit 2017
Next-Generation Medical Analysis | AWS Public Sector Summit 2017Amazon Web Services
 
How to Re-architect Teamcenter Footprint
How to Re-architect Teamcenter FootprintHow to Re-architect Teamcenter Footprint
How to Re-architect Teamcenter FootprintMatt Tremmel
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebMathieu d'Aquin
 
Object Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNObject Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNMinhazul Arefin
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 

Similaire à Dmla0609 Hoeck Presentation (20)

Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web Services
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
Sowmya Raghavan Strand Life
Sowmya Raghavan Strand LifeSowmya Raghavan Strand Life
Sowmya Raghavan Strand Life
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionTutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Measuring electronic resource availability final version
Measuring electronic resource availability final versionMeasuring electronic resource availability final version
Measuring electronic resource availability final version
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Next-Generation Medical Analysis | AWS Public Sector Summit 2017
Next-Generation Medical Analysis | AWS Public Sector Summit 2017Next-Generation Medical Analysis | AWS Public Sector Summit 2017
Next-Generation Medical Analysis | AWS Public Sector Summit 2017
 
How to Re-architect Teamcenter Footprint
How to Re-architect Teamcenter FootprintHow to Re-architect Teamcenter Footprint
How to Re-architect Teamcenter Footprint
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Object Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNObject Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNN
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 

Dmla0609 Hoeck Presentation

  • 1. Interactive Visual Data Analytics Wolfgang G. Hoeck, Ph.D. Senior Manager, Therapeutic Area Systems Amgen Inc. Laboratory Data Management, Munich, June16-17, 2009
  • 2. Agenda  A bit about Amgen  Interactive visual data analytics explained  Screening and target identification/validation  Expectations from an interactive visual data analytics platform  Data formats ARE important  Registration systems: uniquely identifying what you are working with  Bringing data together, the art of data mapping  From tabular data to data networks Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 2
  • 3. Amgen: A Biotechnology Pioneer  Founded in 1980, Amgen was one of the first biotechnology companies to successfully discover, develop and make protein-based medicines  Today, we’re leading the industry in its next wave of innovation by: – Developing therapies in multiple modalities – Driving cutting-edge research and development – Continuing to advance the science of biotechnological manufacturing 3
  • 4. Our Worldwide Presence Cambridge, MA Norway Denmark Toronto, ON Luxembourg Finland West Greenwich, RI The Netherlands Sweden Washington, DC Belgium Estonia Burnaby, BC Ireland Latvia Lithuania Bothell, WA Russia Seattle, WA England Longmont, CO Czech Republic France Boulder, CO Poland Switzerland Slovakia Hungary Fremont, CA India United Arab Emirates South San Francisco, CA Hong Kong Greece Slovenia Thousand Oaks, CA Austria Germany Mexico City, Mexico Italy Louisville, KY Spain Australia Juncos, Puerto Rico Portugal New Zealand Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 4
  • 5. Scientific Data are complex, and it’s not going to get any better  Target Identification & Validation – Gene Expression of Cell Line Panels: 200 x 45000 x 3 • Understand differential expression of one or a handful of genes • Understand expression profile in a particular cell line only – Gene Expression of tumor samples: The Cancer Genome Atlas • Pilot phase: 3 tumor types - 500 GBM/Ovarian Cancer & 200 Lung Cancer samples • Next years: 25 more tumor types  Compound/Target Profiling – 400+ targets across 100’s of small molecule compounds • Compare target properties with compound properties  Cell Line Profiling – 500 cell lines treated with 50 therapeutic molecules – Each cell line has genetic abnormalities in many genes (mutations, deletions, insertions, rearrangements, etc.) Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 5
  • 6. Visualization of complex data must be made available in interactive format Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 6
  • 7. Interactive browsing of pre-analyzed data – finding cell lines for in-vitro work Step 1: Select gene of interest, e.g.: EGFR Step 2: Select study of interest Step 3: Review relative expression pattern Step 4: Select cell line(s) for further work
  • 8. Steps to share data in an interactive visual format  Determine the location of desired data (one or multiple places and/or formats)  Run a query against a database/data warehouse Power User  Capture a dataset(table) of rows and columns  Decide on needed analytics & visualizations  Determine visualization settings and state  Share the results with other scientists Decision Maker  Enable scientists to interact with data  Enable scientists to download sets of data Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 8
  • 9. We have many choices to visualize data …  Table  Bar Chart  Box Plot  Scatter Plot (X/Y-Plot)  Line Chart  Heatmap  Parallel Coordinates Plot (Profile Chart)  Network  Map  TreeMap  e-Northern Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 9
  • 10. …and all choices should retain interactivity Filtering a set of cell line and gene alteration data to view a particular set of cells and the set of genes harboring deletions Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 10
  • 11. The ideal interactive, visual data analytics platform Desktop Clients Zero-footprint Web Clients  A desktop client – Rich interactivity, visuals – Rich analytics tools  A server component – Configurable security – Configurable data access Analysis Web Server Server  A web client – Rich interactivity Stats, etc. Server – Easy access  An API for extension capabilities DB1 DB2 DB3 Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 11
  • 12. From Desktop Client to Analysis Server  Data Access  Data Analysis – From files – Clustering Methods – From databases • Hierarchical – From clipboard • K-Means – From services • PCA • SOM  Data Manipulations – Profile Searching – Data Mapping – Data Merging  Documentation – Calculations – Space to explain what was done – Data Transformations  Data Content  Visualizations – Tabular Format – Table – Multiple Tables – X/Y-Plot – Relationships between Tables – Bar Chart  Data Security – Parallel Coordinate Plot – Group Level Security – Box-Plot – Function Level Security – Networks – Integration with Corporate LDAP  Data Storage  Action Logging – One or many tabular datasets – Who, When, What Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 12
  • 13. About Data Formats  Tall-Skinny – aka non-pivoted data format – Each row represents a single event  Short-Wide – aka pivoted data format – Each rows represents a summary of events in particular circumstances – Typically results in “data loss”  Subject-Verb-Object – aka network data format – aka nodes and edges – Represent complex data relationships, i.e.: everything has a potential many-to- many relationship Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 13
  • 14. We are dealing with a Complex Data Concept Network – register your entities! is critical in Disease Project Target has a is critical in BioProcess is represented by works in Gene Pathway occurs in is translated into is functional in has a has a Protein Protein Gene is expressed in Status Status Diff.Expressed Postt.Modified Wildtype Mutated Cell Line is derived from Diff.Expressed Amplified/Deleted Tissue VIBEvents, Laboratory Data Management Conference, Munich, June 16/17th, 2009 14
  • 15. Data Assembly and Integration Contract Human Gene Amgen Screening Results KinomeTree Nomenclature Project/Compound on monthly Kinase Map Database Association spreadsheets • POC/Kd values • Gene Symbol • Kinase • Compound • Entrez Gene • Full Name Classification registered for Symbol • Gene Synonyms • Manual mapping specific Amgen • Compound of Gene Symbols Project Concentration to Kinome • Compound ID classes Contract Screening Data Assembly Data get assembled in Spotfire based on in Desktop Client matching data keys such as Gene Symbol or CompoundID. Visualizations are prepared Publication Step based on scientist’s input. Filters are organized according to frequency of usage. Adjustments can typically be made in a couple of hours. The final file is published Contract Screening into a web-library accessible via hyperlink. Data Assembly in Announcements are made via e-mail and Web Client embedded hyperlink. Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 15
  • 16. Viewing and interacting with integrated data Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009
  • 17. Biology Visualizations – Pathway Example Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 17
  • 18. Network Visualizations – The hairball principle Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 18
  • 19. Network Visualizations – The hairball principle resolved Tools to connect Nodes Kidney Tools to extend Nodes Bladder Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 19
  • 20. Combining tabular & network visualizations Step 1: Select Disease, then select Therapeutic Molecule Step 2: Study Therapeutic Molecule network Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 20
  • 21. Concluding Thoughts  Interactive visualizations are a key to making complex data shareable and understandable  If interactivity is self-explanatory, adoption is very rapid – nobody wants to read a manual  Analytics can be accomplished in the hands of the power user, it does not need to be available for everyone  Data complexity is not getting any simpler, however, with more sophisticated tools even complex data can be made accessible and understandable Thank you for your time and interest Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 21