SlideShare une entreprise Scribd logo
1  sur  27
Dr. David Talby
@davidtalby
WHEN MODELS GO ROGUE:
HARD EARNED LESSONS ABOUT USING
MACHINE LEARNING IN PRODUCTION
MODEL DEVELOPMENT ≠ SOFTWARE DEVELOPMENT
1.
The moment you put
a model in production,
it starts degrading
Medical claims > 4.7 Billion
Pharmacy claims > 1.2 Billion
Providers > 500,000
Patients > 120 million
• Locality (epidemics)
• Seasonality
• Changes in the hospital / population
• Impact of deploying the system
• Combination of all of the above
CONCEPT DRIFT: AN EXAMPLE
Never Changing Always Changing
Online Social
Networking
Models/Rules
Banking &
eCommerce
fraud
Automated trading
Real-time ad bidding
Natural
Language, Social
Behavior Models
Cyber
Security
Physical models:
Face recognition
Voice recognition
Climate models
Google or
Amazon
Search
models
(MUCH MORE THAN ON YOUR ALGORITHM)
HOW FAST DEPENDS ON THE PROBLEM
Political &
Economic
Models
Never Changing Always Changing
Automated
ensemble, boosting
& feature selection
techniques
Automated
‘challenger’
online
evaluation &
deployment
Real-time online
learning via passive
feedback
Hand-crafted
machine learned
models
Active
learning via
active
feedback
Hand
Crafted
Rules
Daily/weekly
batch retraining
SO PUT THE RIGHT MACHINERY IN PLACE
Traditional
Scientific Method:
Test a Hypothesis
Never Changing Always Changing
Automated
ensemble, boosting
& feature selection
techniques
Automated
‘challenger’
online
evaluation &
deployment
Real-time online
learning via passive
feedback
Hand-crafted
machine learned
models
Active
learning via
active
feedback
Hand
Crafted
Rules
Daily/weekly
batch retraining
… WHEN YOU’RE ALLOWED
Traditional
Scientific Method:
Test a Hypothesis
8
Hidden Feedback Loops
Undeclared Consumers
Data dependencies that bite
[D. Sculley et al., Google, December 2014]
WATCH FOR SELF INFLICTED CHANGES
9
Monitor that the distribution of predicted labels is equal to
the distribution of observed labels
BEST PRACTICE: NEW LEVEL OF MONITORING
2.
You rarely get to deploy
the same model twice
Cotter PE, Bhalla VK, Wallis SJ, Biram RW. Predicting readmissions:
Poor performance of the LACE index in an older UK population.
Age Ageing. 2012 Nov;41(6):784-9.
Model Model’s Goal Sample size Context
Charlson morbidity index
(1987)
1-year mortality 607
1 hospital in NYC,
April 1984
Elixhauser morbidity index
(1998)
Hospital charges, length of stay &
in-hospital mortality
1,779,167
438 hospitals in CA,
1992
LACE index
(van Walraven et al., 2010)
30-day mortality or readmission 4,812
11 hospitals in Ontario,
2002-2006
REUSE METHODOLOGY, NOT MODELS
• Clinical coding for two outpatient radiology clinics
• Train on one clinic, use & measure that model on both
0
0.2
0.4
0.6
0.8
1
Precision Recall
Specialized Non-Specialized
DON’T ASSUME WHAT MAKES A DIFFERENCE
Infer procedure (CPT)
90% overlap
3.
It’s really hard to know
how well you’re doing
“it seemed we were only seeing about 10%-15% of the
predicted lift, so we decided to run a little experiment.
And that’s when the wheels totally flew off the bus.”
14
[Peter Borden, SumAll, June 2014]
HOW OPTIMIZELY (ALMOST) GOT ME FIRED
15
[Alice Zheng, Dato, June 2015]
separation of experiences
How many false positives
can we tolerate?
What does the p-value
mean?
Which metric?
How many observations
do we need?
Multiple models,
multiple hypotheses
How much change counts
as real change?
Is the distribution of the
metric Gaussian?
How long to run the test?
One- or two-sided test? Are the variances equal? Catching distribution drift
THE PITFALLS OF A/B TESTING
[Ron Kohavi et al., Microsoft, August 2012]
The Primacy Effect
The Novelty Effect
Best Practice:
A/A Testing
FIVE PUZZLING OUTCOMES EXPLAINED
Regression to the Mean
4.
Often, the real modeling work
only starts in production
SEMI SUPERVISED LEARNING
IN NUMBERS
50+99.9999% 6+
Months
per case
‘Good’ messages
Schemes
(and counting)
ADVERSARIAL LEARNING
ADVERSARIAL LEARNING
5.
Your best people are
needed on the project
after going to production
DESIGN
Most important,
hardest to change
technical decisions
are made here.
BUILD & TEST
Riskiest & most
reused code
components are
built and tested
first.
DEPLOY
First
deployment is
hands-on, then
we automate it
and iterate to
build lower-
priority features.
OPERATE
Ongoing, repetitive
tasks are either
automated away
or handed off to
support &
operations.
SOFTWARE DEVELOPMENT
MODEL
Feature engineering,
model selection &
optimization are done
for the 1st model built.
DEPLOY &
MEASURE
Online metrics is
key in production,
since results will
often defer from
off-line ones.
EXPERIMENT
Design & run as
many experiments,
as fast as possible,
with new inputs,
features &
feedback.
AUTOMATE
Automate the retrain
or active learning
pipeline, including
online metrics &
labeled data
collection.
MODEL DEVELOPMENT
To conclude…
Rethink your development process
Rethink your role in the business
Think “run a hospital”, not “build a hospital”
MODEL DEVELOPMENT ≠ SOFTWARE DEVELOPMENT
@davidtalby

Contenu connexe

En vedette

Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...PyData
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

En vedette (9)

Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Plus de David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsDavid Talby
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022David Talby
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021David Talby
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...David Talby
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in HealthcareDavid Talby
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understandingDavid Talby
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...David Talby
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection SystemDavid Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

Plus de David Talby (14)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Dernier

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Dernier (20)

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

When Models Go Rogue: Hard Earned Lessons About Using Machine Learning in Production

  • 1. Dr. David Talby @davidtalby WHEN MODELS GO ROGUE: HARD EARNED LESSONS ABOUT USING MACHINE LEARNING IN PRODUCTION
  • 2. MODEL DEVELOPMENT ≠ SOFTWARE DEVELOPMENT
  • 3. 1. The moment you put a model in production, it starts degrading
  • 4. Medical claims > 4.7 Billion Pharmacy claims > 1.2 Billion Providers > 500,000 Patients > 120 million • Locality (epidemics) • Seasonality • Changes in the hospital / population • Impact of deploying the system • Combination of all of the above CONCEPT DRIFT: AN EXAMPLE
  • 5. Never Changing Always Changing Online Social Networking Models/Rules Banking & eCommerce fraud Automated trading Real-time ad bidding Natural Language, Social Behavior Models Cyber Security Physical models: Face recognition Voice recognition Climate models Google or Amazon Search models (MUCH MORE THAN ON YOUR ALGORITHM) HOW FAST DEPENDS ON THE PROBLEM Political & Economic Models
  • 6. Never Changing Always Changing Automated ensemble, boosting & feature selection techniques Automated ‘challenger’ online evaluation & deployment Real-time online learning via passive feedback Hand-crafted machine learned models Active learning via active feedback Hand Crafted Rules Daily/weekly batch retraining SO PUT THE RIGHT MACHINERY IN PLACE Traditional Scientific Method: Test a Hypothesis
  • 7. Never Changing Always Changing Automated ensemble, boosting & feature selection techniques Automated ‘challenger’ online evaluation & deployment Real-time online learning via passive feedback Hand-crafted machine learned models Active learning via active feedback Hand Crafted Rules Daily/weekly batch retraining … WHEN YOU’RE ALLOWED Traditional Scientific Method: Test a Hypothesis
  • 8. 8 Hidden Feedback Loops Undeclared Consumers Data dependencies that bite [D. Sculley et al., Google, December 2014] WATCH FOR SELF INFLICTED CHANGES
  • 9. 9 Monitor that the distribution of predicted labels is equal to the distribution of observed labels BEST PRACTICE: NEW LEVEL OF MONITORING
  • 10. 2. You rarely get to deploy the same model twice
  • 11. Cotter PE, Bhalla VK, Wallis SJ, Biram RW. Predicting readmissions: Poor performance of the LACE index in an older UK population. Age Ageing. 2012 Nov;41(6):784-9. Model Model’s Goal Sample size Context Charlson morbidity index (1987) 1-year mortality 607 1 hospital in NYC, April 1984 Elixhauser morbidity index (1998) Hospital charges, length of stay & in-hospital mortality 1,779,167 438 hospitals in CA, 1992 LACE index (van Walraven et al., 2010) 30-day mortality or readmission 4,812 11 hospitals in Ontario, 2002-2006 REUSE METHODOLOGY, NOT MODELS
  • 12. • Clinical coding for two outpatient radiology clinics • Train on one clinic, use & measure that model on both 0 0.2 0.4 0.6 0.8 1 Precision Recall Specialized Non-Specialized DON’T ASSUME WHAT MAKES A DIFFERENCE Infer procedure (CPT) 90% overlap
  • 13. 3. It’s really hard to know how well you’re doing
  • 14. “it seemed we were only seeing about 10%-15% of the predicted lift, so we decided to run a little experiment. And that’s when the wheels totally flew off the bus.” 14 [Peter Borden, SumAll, June 2014] HOW OPTIMIZELY (ALMOST) GOT ME FIRED
  • 15. 15 [Alice Zheng, Dato, June 2015] separation of experiences How many false positives can we tolerate? What does the p-value mean? Which metric? How many observations do we need? Multiple models, multiple hypotheses How much change counts as real change? Is the distribution of the metric Gaussian? How long to run the test? One- or two-sided test? Are the variances equal? Catching distribution drift THE PITFALLS OF A/B TESTING
  • 16. [Ron Kohavi et al., Microsoft, August 2012] The Primacy Effect The Novelty Effect Best Practice: A/A Testing FIVE PUZZLING OUTCOMES EXPLAINED Regression to the Mean
  • 17. 4. Often, the real modeling work only starts in production
  • 19. IN NUMBERS 50+99.9999% 6+ Months per case ‘Good’ messages Schemes (and counting)
  • 22. 5. Your best people are needed on the project after going to production
  • 23. DESIGN Most important, hardest to change technical decisions are made here. BUILD & TEST Riskiest & most reused code components are built and tested first. DEPLOY First deployment is hands-on, then we automate it and iterate to build lower- priority features. OPERATE Ongoing, repetitive tasks are either automated away or handed off to support & operations. SOFTWARE DEVELOPMENT
  • 24. MODEL Feature engineering, model selection & optimization are done for the 1st model built. DEPLOY & MEASURE Online metrics is key in production, since results will often defer from off-line ones. EXPERIMENT Design & run as many experiments, as fast as possible, with new inputs, features & feedback. AUTOMATE Automate the retrain or active learning pipeline, including online metrics & labeled data collection. MODEL DEVELOPMENT
  • 26. Rethink your development process Rethink your role in the business Think “run a hospital”, not “build a hospital” MODEL DEVELOPMENT ≠ SOFTWARE DEVELOPMENT

Notes de l'éditeur

  1. Michigan case: more than 1,000 patients, $35M paid for chemo of healthy patients