SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
| 0
Dr Jabe Wilson, Elsevier R&D Solutions Professional Services
Zen and the Art of Data Science Maintenance
Bio-IT World, 17 May 2018
| 1
The experience of doing Data Science
“Programming is never easy …
you’re kind of always on this
frontier where you are out of your
depth. And one of the things you
have to learn is to accept this
feeling – of being constantly
wrong. Which makes coding
sound like a branch of Zen
Buddhism”
- Andrew Smith, Code to Joy.
April/May.The Economist, 1843.
| 2
Data Science as an Art
• Intuition
• Qualitative insights
• Exploring a problem through
solutions
• “Inspiration exists, but it has to
find you working.”
- Pablo Picasso
Studio, Tony Wilson (1973) my father. http://www.tonywilsonpainterprintmaker.com/
| 3
What can go wrong doing Data Science
• Bad Data
• Bad Models
• Opaque Predictions
| 4
Good Data
Curation:
• Tagging against dictionaries
• Mapping dictionaries
• Regularising numeric units
| 5
Right Data
Depends on your choice of
model (semantic or machine
learning).
What happens when the model
changes (do you still have
enough data)?
| 6
In-time Data
• Transactional workflows
• Dynamic knowledge hubs
• Opportunity costs
| 7
Examples of Data Science in
practice
| 8
Examples of Data Science in practice
• Rare disease treatment: Highly curated data allows
us to make predictions, but also required judgement
in the building of the model
• Translational safety: Concordance data is
predictive; but also shows the importance of curating
taxonomies
• Evidence selection: In order to select the right
information sets you need to be able to filter on the
context of parameter based assertions (machine
learning can help improve data selection)
• Real World Data interpretation: Machine learning
classification can be enhanced with taxonomies, but
also deliver across multimodal data sets
| 9
Examples of Data Science in practice
• Rare disease treatment: Highly curated data allows
us to make predictions, but also required judgement
in the building of the model
• Translational safety: Concordance data is
predictive; but also shows the importance of curating
taxonomies
• Evidence selection: In order to select the right
information sets you need to be able to filter on the
context of parameter based assertions (machine
learning can help improve data selection)
• Real World Data interpretation: Machine learning
classification can be enhanced with taxonomies, but
also deliver across multimodal data sets
| 10
| 10
Biological Pathways
extracted via semantic
text mining
A upregulates B
B upregulates C
C increases
Disease
A  B  C  disease
Bioactivities
through text analysis
IC50 6.3nM, kinase binding
assay 10mM concentration
Chemical Structures
And Properties
InChi,
Name
NCBI,
Uniprot
EMTREE
ReaxysTree,
Structures
Normalizing vocabularies required: proteins, diseases, drugs, chemicals
| 11
• Very large data sets
- Order of ~107 documents published (patents, journals, books)
- Each document has ~200 sentences ~109 statements.
- Statements are about molecules, properties, reactions, indications etc.
• Combinatorial connections between large data sets
- “connecting the dots” among these facts results in a very large number of
possible connections
-
𝑛!
𝑘! 𝑛−𝑘 !
combinations of k elements chosen from a pool of n.
11
What Constitutes Big Data?
Pathways
• Relationships mined
from 12,000 titles ,
25M documents
• <subject> <verb>
<object> relationships
• Each subject, object,
verb has a taxonomy
• Example: “protein”
causes/induces
disease
Compounds
• 16,000 journal titles
plus patent offices
• Compounds,
Reactions,
Properties
• Over 6 million
compounds with
bioactivity
Bioassays
• Biological relationships
mined from journals/patents
(over 16 million)
• <compound> <verb>
<object> <quantity>
• Example: Sunitinib binds-to
Bcr-abl in <assay type> at
1nM
| 12
| 12
Building and refining the disease model for hyperinsulinism
Picked relevant
pathways
(from a collection of 1800
models)
Explored functions of
proteins using 6.2M pre-
text mined relations
and embedded Gene
Ontology
Summarized what is known
about CHI mechanism in an
overview model
| 13
| 13
Automated analysis combines bioassay data with text-mined data
• 88 targets related to
hyperinsulinism with ≥3
literature references
• Full relationship
information
Find all targets that
could be used to affect
the disease state
Step 1
From pathways to treatments
| 14
| 14
Automated analysis combines bioassay data with text-mined data
Find all targets that
could be used to affect
the disease state
Query for each protein to find
compounds that target it (>6
log units)
Step 1 Step 2
Targets based on
text mining
Approved
compounds
Bioassay data
From pathways to treatments
| 15
| 15
Automated analysis combines bioassay data with text-mined data
Mean of activities
among these targets
Targets and activities for
each compound
Drug-likeness
metrics for
sorting/classification
• All compounds that
were observed to bind
to targets in pathway
• Sorted by number of
active targets.
Too many targets may
suggest lack of specificity.
Find all targets that
could be used to affect
the disease state
Query for each protein to find
compounds that target it (>6
log units)
Collate data by compound to summarize the
targets/activities related to disease that the
compound hits
• Compute geometric mean of activities for ranking
• Rank by number of targets and geometric mean of
activities against targets
Step 1 Step 2
Step 3
From pathways to treatments
| 16
| 16
Approved compounds that may treat hyperinsulinism
• Each binds to one or
more targets related
to the disease
• Can easily be
obtained and tested
in preclinical studies
• List includes a
compound known to
treat hyperinsulinism,
sirolimus
| 17
17
Example: Process for Finding New Indications for a Drug (Ruxolitinib)
Find all targets for which
the compound has high
affinity
Collate the diseases by targets
and activity of the compound
Using unique set of proteins
from steps 1 and search for all
diseases reported to be related
to them
Step 1 Step 2 Step 3
Find all compound-
protein/gene relationships
with > 1 reference using
text analysis
Targets
inhibited
Targets
Related to
Disease
| 18
18
This Analysis Shows Connections of Ruxolitinib to Alopecia
A cancer drug that grows hair! Trials are under way
Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition
Nature Medicine 20, 1043–1049 (2014) doi:10.1038/nm.3645
Global transcriptional profiling of mouse and human AA skin revealed gene expression
signatures indicative of cytotoxic T cell infiltration, an interferon-γ (IFNG) response and
upregulation of several γ-chain (γc) cytokines known to promote the activation and
survival of IFN-γ–producing CD8+NKG2D+ effector T cells. Therapeutically, antibody-
mediated blockade of IFN-γ, interleukin-2 (IL-2) or interleukin-15 receptor β (IL-15Rβ)
prevented disease development, reducing the accumulation of CD8+NKG2D+ T cells in the
skin and the dermal IFN response in a mouse model of AA.
| 19
Examples of Data Science in practice
• Rare disease treatment: Highly curated data allows
us to make predictions, but also required judgement
in the building of the model
• Translational safety: Concordance data is
predictive; but also shows the importance of curating
taxonomies
• Evidence selection: In order to select the right
information sets you need to be able to filter on the
context of parameter based assertions (machine
learning can help improve data selection)
• Real World Data interpretation: Machine learning
classification can be enhanced with taxonomies, but
also deliver across multimodal data sets
| 20
• Concordance between
preclinical studies and human
adverse events, based on the
calculation of positive likelihood
ratios.
- Chi-squared tells us if there is a
statistically significant
relationship of any kind
between the human and animal
observations (which is used as
a filter).
- The likelihood ratio measures
the predictive value of the
animal observation.
A translational safety big data analysis
| 21
• If the chi-squared is high, and
the likelihood ratio is low, one
can state that there is high
confidence that the animal
observation does not predict
human observation.
• In which case the animal
model should not be used.
A translational safety big data analysis
| 22
• If the chi-squared is high, and
the likelihood ratio is high, one
can state that there is high
confidence that the animal
observation does predict
human observation.
• In which case checks for
adverse events can be added
to clinical trials.
A translational safety big data analysis
| 23
• Curation of taxonomy data.
• The higher levels of the
MedDRA hierarchy sometimes
include such a variety of
events that the additional false
positives and negatives result
in no statistical confidence in
the relationship.
A translational safety big data analysis
| 24
Examples of Data Science in practice
• Rare disease treatment: Highly curated data allows
us to make predictions, but also required judgement
in the building of the model
• Translational safety: Concordance data is
predictive; but also shows the importance of curating
taxonomies
• Evidence selection: In order to select the right
information sets you need to be able to filter on the
context of parameter based assertions (machine
learning can help improve data selection)
• Real World Data interpretation: Machine learning
classification can be enhanced with taxonomies, but
also deliver across multimodal data sets
| 25
Cold mice problem
• If we can interpret and classify complex parameter based
statements this allows us to select the right data.
22°C Cage (Standard Housing)
30°C Cage (Thermoneutrality)
Stress/Immune
response to
cold
No Immune
response to
cold
Decreased
response to
chemotoxic
drugs
Increased
response to
chemotoxic
drugs
| 26
All mice were maintained in a temperature controlled (22 ± 2 °C) environment 12-h light 12-h
dark photocycle and fed rodent chow meal .
The mice were individually placed into an acrylic cylinder (25 cm height 10 cm diameter)
containing 8 cm of water maintained at 22–24 °C
Cold mice problem: results
Allowing research reports to be filtered based on whether results will
be reliable due to experimental conditions.
| 27
Use case examples
• Rare disease treatment: Highly curated data allows
us to make predictions, but also required judgement
in the building of the model
• Translational safety: Concordance data is
predictive; but also shows the importance of curating
taxonomies
• Evidence selection: In order to select the right
information sets you need to be able to filter on the
context of parameter based assertions (machine
learning can help improve data selection)
• Real World Data interpretation: Machine learning
classification can be enhanced with taxonomies, but
also deliver across multimodal data sets
| 28
Real World Data interpretation
• Machine Learning:
- Classify images.
- Classify concepts (combining taxonomies with word embeddings
improves performance on similarity measurement and entity
classification).
• Opportunities for developing multimodal classification of data
sources with unstructured text and unlabelled images.
| 29
• These use case examples
illustrate the challenges and
creativity required to deliver
Data Science.
• We are developing a platform
to help support these activities.
o Good data: curated
data.
o Right data: export graph
and feature data.
o In-time data: bringing
data sets together in a
knowledge hub to
enable in-time data.
Supporting Data Scientists to deliver results
| 30
A platform for supporting Data Scientists
• Inspiration exists, but it has to
find you working.
- Pablo Picasso
• If you want to become a Data
Science Platform development
partner, or wish to hear more
about continuing developments
around Data Science at Elsevier
please contact me:
• www.linkedin.com/in/jabewilson/
• jabe.wilson@elsevier.com
Studio, Tony Wilson (1973) my father. http://www.tonywilsonpainterprintmaker.com/
| 31
Acknowledgements
• Helena F. Deus, Corey Harper, Darin McBeath and Ron Daniel Jr –
Elsevier Labs
• Matthew Clark, Frederik van den Broek, Anton Yuryev, Maria Shkrob
– Elsevier Professional services
• Thomas Steger-Hartmann, Investigational Toxicology, Bayer AG

Contenu connexe

Tendances

Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug developmentSRUTHI N
 
A Survey on Various Disease Prediction Techniques
A Survey on Various Disease Prediction TechniquesA Survey on Various Disease Prediction Techniques
A Survey on Various Disease Prediction Techniquesijtsrd
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Artificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug DevelopmentArtificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug DevelopmentAshley Recchione
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareAseda Owusua Addai-Deseh
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligenceAl Dossetter
 
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...Editor IJCATR
 
Introduction to health research
Introduction to health researchIntroduction to health research
Introduction to health researchKannan Iyanar
 
Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...Jayanthi Repalli, PhD
 
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...ijtsrd
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data miningAbheepsa Pattnaik
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...BenVanCalster
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Data mining techniques on heart failure diagnosis
Data mining techniques on heart failure diagnosisData mining techniques on heart failure diagnosis
Data mining techniques on heart failure diagnosisSteve Iduye
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Ewout Steyerberg
 

Tendances (19)

Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug development
 
A Survey on Various Disease Prediction Techniques
A Survey on Various Disease Prediction TechniquesA Survey on Various Disease Prediction Techniques
A Survey on Various Disease Prediction Techniques
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Artificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug DevelopmentArtificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug Development
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Discovery_Schreiner
Discovery_SchreinerDiscovery_Schreiner
Discovery_Schreiner
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in Healthcare
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
 
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
 
Introduction to health research
Introduction to health researchIntroduction to health research
Introduction to health research
 
Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...
 
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data mining
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Data mining techniques on heart failure diagnosis
Data mining techniques on heart failure diagnosisData mining techniques on heart failure diagnosis
Data mining techniques on heart failure diagnosis
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 

Similaire à Zen and the Art of Data Science Maintenance

Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsSetia Pramana
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsMatthew Clark
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryDinesh V
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesPhilip Payne
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsKarin Verspoor
 
Amia tb-review-15
Amia tb-review-15Amia tb-review-15
Amia tb-review-15Russ Altman
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in HealthcarePaul Agapow
 
Big Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical ResearchBig Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical ResearchBBCR Consulting
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSemantic Web San Diego
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challengesinside-BigData.com
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...University of Malaya
 
Accelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwideAccelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwideJoaquin Dopazo
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designingW Roseybala Devi
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceWessel Kraaij
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...D3 Consutling
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-finalRuss Altman
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?Paul Agapow
 

Similaire à Zen and the Art of Data Science Maintenance (20)

Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical Bioinformatics
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare Industry
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across Scales
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questions
 
Amia tb-review-15
Amia tb-review-15Amia tb-review-15
Amia tb-review-15
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
Machine Learning and Multi Drug Resistant(MDR) Infections case study
Machine Learning and Multi Drug Resistant(MDR) Infections case studyMachine Learning and Multi Drug Resistant(MDR) Infections case study
Machine Learning and Multi Drug Resistant(MDR) Infections case study
 
Big Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical ResearchBig Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical Research
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
 
Accelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwideAccelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwide
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-final
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?
 

Plus de Elsevier

Infographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trendsInfographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trendsElsevier
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
Machine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhDMachine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhDElsevier
 
Gender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on EngineeringGender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on EngineeringElsevier
 
Elsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer ScienceElsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer ScienceElsevier
 
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Elsevier
 
Elsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier
 
Elsevier Society Member Survey
Elsevier Society Member SurveyElsevier Society Member Survey
Elsevier Society Member SurveyElsevier
 
Food Security: an information provider’s view
Food Security: an information provider’s viewFood Security: an information provider’s view
Food Security: an information provider’s viewElsevier
 
Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015Elsevier
 
Sustainability Science in a Global Landscape
Sustainability Science in a Global LandscapeSustainability Science in a Global Landscape
Sustainability Science in a Global LandscapeElsevier
 
Research Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive SummaryResearch Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive SummaryElsevier
 
Mendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to PlutoMendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to PlutoElsevier
 
Infographic: The Noble Nurse
Infographic: The Noble NurseInfographic: The Noble Nurse
Infographic: The Noble NurseElsevier
 
Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Elsevier
 
Open access survey
Open access surveyOpen access survey
Open access surveyElsevier
 
Presentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM ResearchPresentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM ResearchElsevier
 
Culinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoesCulinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoesElsevier
 
Kudos: How it Works
Kudos: How it WorksKudos: How it Works
Kudos: How it WorksElsevier
 

Plus de Elsevier (20)

Infographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trendsInfographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trends
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Machine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhDMachine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhD
 
Gender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on EngineeringGender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on Engineering
 
Elsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer ScienceElsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer Science
 
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017
 
Elsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot Infographic
 
Elsevier Society Member Survey
Elsevier Society Member SurveyElsevier Society Member Survey
Elsevier Society Member Survey
 
Food Security: an information provider’s view
Food Security: an information provider’s viewFood Security: an information provider’s view
Food Security: an information provider’s view
 
Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015
 
Sustainability Science in a Global Landscape
Sustainability Science in a Global LandscapeSustainability Science in a Global Landscape
Sustainability Science in a Global Landscape
 
Research Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive SummaryResearch Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive Summary
 
Mendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to PlutoMendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to Pluto
 
Infographic: The Noble Nurse
Infographic: The Noble NurseInfographic: The Noble Nurse
Infographic: The Noble Nurse
 
Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit
 
Open access survey
Open access surveyOpen access survey
Open access survey
 
Presentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM ResearchPresentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM Research
 
Culinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoesCulinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoes
 
Kudos: How it Works
Kudos: How it WorksKudos: How it Works
Kudos: How it Works
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Zen and the Art of Data Science Maintenance

  • 1. | 0 Dr Jabe Wilson, Elsevier R&D Solutions Professional Services Zen and the Art of Data Science Maintenance Bio-IT World, 17 May 2018
  • 2. | 1 The experience of doing Data Science “Programming is never easy … you’re kind of always on this frontier where you are out of your depth. And one of the things you have to learn is to accept this feeling – of being constantly wrong. Which makes coding sound like a branch of Zen Buddhism” - Andrew Smith, Code to Joy. April/May.The Economist, 1843.
  • 3. | 2 Data Science as an Art • Intuition • Qualitative insights • Exploring a problem through solutions • “Inspiration exists, but it has to find you working.” - Pablo Picasso Studio, Tony Wilson (1973) my father. http://www.tonywilsonpainterprintmaker.com/
  • 4. | 3 What can go wrong doing Data Science • Bad Data • Bad Models • Opaque Predictions
  • 5. | 4 Good Data Curation: • Tagging against dictionaries • Mapping dictionaries • Regularising numeric units
  • 6. | 5 Right Data Depends on your choice of model (semantic or machine learning). What happens when the model changes (do you still have enough data)?
  • 7. | 6 In-time Data • Transactional workflows • Dynamic knowledge hubs • Opportunity costs
  • 8. | 7 Examples of Data Science in practice
  • 9. | 8 Examples of Data Science in practice • Rare disease treatment: Highly curated data allows us to make predictions, but also required judgement in the building of the model • Translational safety: Concordance data is predictive; but also shows the importance of curating taxonomies • Evidence selection: In order to select the right information sets you need to be able to filter on the context of parameter based assertions (machine learning can help improve data selection) • Real World Data interpretation: Machine learning classification can be enhanced with taxonomies, but also deliver across multimodal data sets
  • 10. | 9 Examples of Data Science in practice • Rare disease treatment: Highly curated data allows us to make predictions, but also required judgement in the building of the model • Translational safety: Concordance data is predictive; but also shows the importance of curating taxonomies • Evidence selection: In order to select the right information sets you need to be able to filter on the context of parameter based assertions (machine learning can help improve data selection) • Real World Data interpretation: Machine learning classification can be enhanced with taxonomies, but also deliver across multimodal data sets
  • 11. | 10 | 10 Biological Pathways extracted via semantic text mining A upregulates B B upregulates C C increases Disease A  B  C  disease Bioactivities through text analysis IC50 6.3nM, kinase binding assay 10mM concentration Chemical Structures And Properties InChi, Name NCBI, Uniprot EMTREE ReaxysTree, Structures Normalizing vocabularies required: proteins, diseases, drugs, chemicals
  • 12. | 11 • Very large data sets - Order of ~107 documents published (patents, journals, books) - Each document has ~200 sentences ~109 statements. - Statements are about molecules, properties, reactions, indications etc. • Combinatorial connections between large data sets - “connecting the dots” among these facts results in a very large number of possible connections - 𝑛! 𝑘! 𝑛−𝑘 ! combinations of k elements chosen from a pool of n. 11 What Constitutes Big Data? Pathways • Relationships mined from 12,000 titles , 25M documents • <subject> <verb> <object> relationships • Each subject, object, verb has a taxonomy • Example: “protein” causes/induces disease Compounds • 16,000 journal titles plus patent offices • Compounds, Reactions, Properties • Over 6 million compounds with bioactivity Bioassays • Biological relationships mined from journals/patents (over 16 million) • <compound> <verb> <object> <quantity> • Example: Sunitinib binds-to Bcr-abl in <assay type> at 1nM
  • 13. | 12 | 12 Building and refining the disease model for hyperinsulinism Picked relevant pathways (from a collection of 1800 models) Explored functions of proteins using 6.2M pre- text mined relations and embedded Gene Ontology Summarized what is known about CHI mechanism in an overview model
  • 14. | 13 | 13 Automated analysis combines bioassay data with text-mined data • 88 targets related to hyperinsulinism with ≥3 literature references • Full relationship information Find all targets that could be used to affect the disease state Step 1 From pathways to treatments
  • 15. | 14 | 14 Automated analysis combines bioassay data with text-mined data Find all targets that could be used to affect the disease state Query for each protein to find compounds that target it (>6 log units) Step 1 Step 2 Targets based on text mining Approved compounds Bioassay data From pathways to treatments
  • 16. | 15 | 15 Automated analysis combines bioassay data with text-mined data Mean of activities among these targets Targets and activities for each compound Drug-likeness metrics for sorting/classification • All compounds that were observed to bind to targets in pathway • Sorted by number of active targets. Too many targets may suggest lack of specificity. Find all targets that could be used to affect the disease state Query for each protein to find compounds that target it (>6 log units) Collate data by compound to summarize the targets/activities related to disease that the compound hits • Compute geometric mean of activities for ranking • Rank by number of targets and geometric mean of activities against targets Step 1 Step 2 Step 3 From pathways to treatments
  • 17. | 16 | 16 Approved compounds that may treat hyperinsulinism • Each binds to one or more targets related to the disease • Can easily be obtained and tested in preclinical studies • List includes a compound known to treat hyperinsulinism, sirolimus
  • 18. | 17 17 Example: Process for Finding New Indications for a Drug (Ruxolitinib) Find all targets for which the compound has high affinity Collate the diseases by targets and activity of the compound Using unique set of proteins from steps 1 and search for all diseases reported to be related to them Step 1 Step 2 Step 3 Find all compound- protein/gene relationships with > 1 reference using text analysis Targets inhibited Targets Related to Disease
  • 19. | 18 18 This Analysis Shows Connections of Ruxolitinib to Alopecia A cancer drug that grows hair! Trials are under way Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition Nature Medicine 20, 1043–1049 (2014) doi:10.1038/nm.3645 Global transcriptional profiling of mouse and human AA skin revealed gene expression signatures indicative of cytotoxic T cell infiltration, an interferon-γ (IFNG) response and upregulation of several γ-chain (γc) cytokines known to promote the activation and survival of IFN-γ–producing CD8+NKG2D+ effector T cells. Therapeutically, antibody- mediated blockade of IFN-γ, interleukin-2 (IL-2) or interleukin-15 receptor β (IL-15Rβ) prevented disease development, reducing the accumulation of CD8+NKG2D+ T cells in the skin and the dermal IFN response in a mouse model of AA.
  • 20. | 19 Examples of Data Science in practice • Rare disease treatment: Highly curated data allows us to make predictions, but also required judgement in the building of the model • Translational safety: Concordance data is predictive; but also shows the importance of curating taxonomies • Evidence selection: In order to select the right information sets you need to be able to filter on the context of parameter based assertions (machine learning can help improve data selection) • Real World Data interpretation: Machine learning classification can be enhanced with taxonomies, but also deliver across multimodal data sets
  • 21. | 20 • Concordance between preclinical studies and human adverse events, based on the calculation of positive likelihood ratios. - Chi-squared tells us if there is a statistically significant relationship of any kind between the human and animal observations (which is used as a filter). - The likelihood ratio measures the predictive value of the animal observation. A translational safety big data analysis
  • 22. | 21 • If the chi-squared is high, and the likelihood ratio is low, one can state that there is high confidence that the animal observation does not predict human observation. • In which case the animal model should not be used. A translational safety big data analysis
  • 23. | 22 • If the chi-squared is high, and the likelihood ratio is high, one can state that there is high confidence that the animal observation does predict human observation. • In which case checks for adverse events can be added to clinical trials. A translational safety big data analysis
  • 24. | 23 • Curation of taxonomy data. • The higher levels of the MedDRA hierarchy sometimes include such a variety of events that the additional false positives and negatives result in no statistical confidence in the relationship. A translational safety big data analysis
  • 25. | 24 Examples of Data Science in practice • Rare disease treatment: Highly curated data allows us to make predictions, but also required judgement in the building of the model • Translational safety: Concordance data is predictive; but also shows the importance of curating taxonomies • Evidence selection: In order to select the right information sets you need to be able to filter on the context of parameter based assertions (machine learning can help improve data selection) • Real World Data interpretation: Machine learning classification can be enhanced with taxonomies, but also deliver across multimodal data sets
  • 26. | 25 Cold mice problem • If we can interpret and classify complex parameter based statements this allows us to select the right data. 22°C Cage (Standard Housing) 30°C Cage (Thermoneutrality) Stress/Immune response to cold No Immune response to cold Decreased response to chemotoxic drugs Increased response to chemotoxic drugs
  • 27. | 26 All mice were maintained in a temperature controlled (22 ± 2 °C) environment 12-h light 12-h dark photocycle and fed rodent chow meal . The mice were individually placed into an acrylic cylinder (25 cm height 10 cm diameter) containing 8 cm of water maintained at 22–24 °C Cold mice problem: results Allowing research reports to be filtered based on whether results will be reliable due to experimental conditions.
  • 28. | 27 Use case examples • Rare disease treatment: Highly curated data allows us to make predictions, but also required judgement in the building of the model • Translational safety: Concordance data is predictive; but also shows the importance of curating taxonomies • Evidence selection: In order to select the right information sets you need to be able to filter on the context of parameter based assertions (machine learning can help improve data selection) • Real World Data interpretation: Machine learning classification can be enhanced with taxonomies, but also deliver across multimodal data sets
  • 29. | 28 Real World Data interpretation • Machine Learning: - Classify images. - Classify concepts (combining taxonomies with word embeddings improves performance on similarity measurement and entity classification). • Opportunities for developing multimodal classification of data sources with unstructured text and unlabelled images.
  • 30. | 29 • These use case examples illustrate the challenges and creativity required to deliver Data Science. • We are developing a platform to help support these activities. o Good data: curated data. o Right data: export graph and feature data. o In-time data: bringing data sets together in a knowledge hub to enable in-time data. Supporting Data Scientists to deliver results
  • 31. | 30 A platform for supporting Data Scientists • Inspiration exists, but it has to find you working. - Pablo Picasso • If you want to become a Data Science Platform development partner, or wish to hear more about continuing developments around Data Science at Elsevier please contact me: • www.linkedin.com/in/jabewilson/ • jabe.wilson@elsevier.com Studio, Tony Wilson (1973) my father. http://www.tonywilsonpainterprintmaker.com/
  • 32. | 31 Acknowledgements • Helena F. Deus, Corey Harper, Darin McBeath and Ron Daniel Jr – Elsevier Labs • Matthew Clark, Frederik van den Broek, Anton Yuryev, Maria Shkrob – Elsevier Professional services • Thomas Steger-Hartmann, Investigational Toxicology, Bayer AG