SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
© 2017 MapR Technologies 1
Machine Learning Model Management
© 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
© 2017 MapR Technologies 3
Machine Learning Everywhere
Image courtesy Mtell used with permission.Images © Ellen Friedman.
© 2017 MapR Technologies 4
Traditional View
© 2017 MapR Technologies 5
Traditional View: This isn’t the whole story
© 2017 MapR Technologies 6
90% of the effort in successful machine
learning isn’t in the training or model dev…
It’s the logistics
© 2017 MapR Technologies 7
Why?
• Just getting the training data is hard
– Which data? How to make it accessible? Multiple sources!
– New kinds of observations force restarts
– Requires a ton of domain knowledge
• The myth of the unitary model
– You can’t train just one
– You will have dozens of models, likely hundreds or more
– Handoff to new versions is tricky
– You have to get run-time to be sure about which is better

© 2017 MapR Technologies 8
What Machine Learning Tool is Best?
• Most successful groups keep several “favorite” machine
learning tools at hand
– No single tool is best in every situation
• The most important tool is a platform that supports logistics well
– Don’t have to do everything at the application level
– Lots of what matters can be handled at the platform level
• A good design for the logistics can make a big difference
© 2017 MapR Technologies 9
Some Gotchas
• Ops-oriented people will not “get it” regarding modeling
subtleties
• Data scientists will not “get it” regarding operational realities
• Therefore, modelers have to deliver self-contained models
• And, ops has to provide pre-wired structure
© 2017 MapR Technologies 10
Rendezvous Architecture
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
© 2017 MapR Technologies 11
Rendezvous to the Rescue: Better ML Logistics
• Stream-1st architecture is a powerful approach with surprisingly
widespread advantages
– Innovative technologies emerging to for streaming data
• Microservices approach provides flexibility
– Streaming supports microservices (if done right)
• Containers remove surprises
– Predictable environment for running models
© 2017 MapR Technologies 12
Rendezvous: Mainly for Decisioning Engines
• Decisioning models
– Looking for a “right answer”
– Simpler than reinforcement learning
• Examples include:
– Fraud detection
– Predictive analytics / market prediction
– Churn prediction (as in telecommunications)
– Yield optimization
– Deep learning in form of speech or image recognition, in some cases
© 2017 MapR Technologies 13
Why Stream?
Munich surfing wave Image © 2017 Ellen Friedman
© 2017 MapR Technologies 14
Stream-1st Architecture: Basis for MicroServices
Stream instead of database as the shared “truth”
POS
1..n
Fraud
detector
Last card
use
Updater
Card
analytics
Other
card activity
Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 15
Streaming Isolates Services
stream
Data
source
Consumer
© 2017 MapR Technologies 16
With MapR, Geo-Distributed Data Appears Local
stream
stream
Data
source
Consumer
© 2017 MapR Technologies 17
With MapR, Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center
© 2017 MapR Technologies 18
Features of Good Streaming
• It is Persistent
– Messages stick around for other consumers
– Consumers don’t affect producers
– Consumer doesn’t have to be online when message arrives
• It is Performant
– You don’t have to worry if a stream can keep up
• It is Pervasive
– It is there whenever you need it, no need to deploy anything
– How much work is it to create a new file? Why harder for a stream?
© 2017 MapR Technologies 19
Stream transport supports
microservices
© 2017 MapR Technologies 20
But we talked about decision
engines?!?
© 2017 MapR Technologies 21
What We Ultimately Want
request
response
Model
© 2017 MapR Technologies 22
But This Isn’t The Answer
Model 1
request
response
Load
balancer
Model 2
Model 3
© 2017 MapR Technologies 23
First Try with Streams
Input
Model 1
Model 2
Model 3
request
response
?
© 2017 MapR Technologies 24
First Rendezvous
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
© 2017 MapR Technologies 25
Some Key Points
• Note that all models see identical inputs
• All models run in production setting
• All models send scores to same stream
• The rendezvous server decides which scores to ignore
• Roll forward, roll back, correlated comparison are all now trivial
© 2017 MapR Technologies 26
Reality Check, Injecting External State
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Database
The world
© 2017 MapR Technologies 27
Recording Raw Data (as it really was)
Input
Scores
Decoy
Model 2
Model 3
Archive
© 2017 MapR Technologies 28
Quality & Reproducibility of Input Data is Important!
• Recording raw-ish data is really a big deal
– Data as seen by a model is worth gold
– Data reconstructed later often has time-machine leaks
– Databases were made for updates, streams are safer
• Raw data is useful for non-ML cases as well (think flexibility)
• Decoy model records training data as seen by models under
development & evaluation
© 2017 MapR Technologies 29
Canary for Comparison
Real
model
∆
Result
Canary
Decoy
Archive
Input
© 2017 MapR Technologies 30
What Does the Canary Do?
• The canary is a real model, but is very rarely updated
• The canary results are almost never used for decisioning
• The virtue of the canary is stability
• Comparing to the canary results gives insight into new models
© 2017 MapR Technologies 31
Isolated Development With Stream Replication
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Internal 1
Internal 2
Internal 3
The world
Model 4
Raw
New
external
data
Input
Internal 4
Production
Development
© 2017 MapR Technologies 32
Scores
ArchiveDecoy
m1
m2
m3
Features /
profiles
InputRaw
© 2017 MapR Technologies 33
ResultsRendezvousScores
ArchiveDecoy
m1
m2
m3
Features /
profiles
InputRaw
© 2017 MapR Technologies 34
Metrics
Metrics
ResultsRendezvousScores
ArchiveDecoy
m1
m2
m3
Features /
profiles
InputRaw
© 2017 MapR Technologies 35
Models in production live in the real
world:
Conditions may (will) change
© 2017 MapR Technologies 36
Not Such Bad Ideas
• Keep models running “in the wings”
– Don’t wait until conditions change to start building the next model
– Keep new short-history models ready to roll, some graybeards as well
• Hot hand-off
– With rendezvous: just stop ignoring the new best model
• Deploy a canary server
– Keep an old model active as a reference
– If it was 90% correct, difference with any better model should be small
– Score distribution should be roughly constant
© 2017 MapR Technologies 37
Correlated Comparison of Score Quantiles
© 2017 MapR Technologies 38
Sample Model Cascade
A
B
Fraud
Fraud
Clean
Clean
Fraud
Assume that finding more frauds is all we care to do
© 2017 MapR Technologies 39
Some Data
© 2017 MapR Technologies 40
Consisting of Type 1
© 2017 MapR Technologies 41
And Type 2
© 2017 MapR Technologies 42
Sample Model Cascade
A
B
Fraud
Fraud
Clean
Clean
Fraud
Good with type 1
Good with type 2
© 2017 MapR Technologies 43
Baseline Conditions
• Model A
– 80% recall on type 1, 0% recall on type 2 (40% net)
• Model B
– 0% recall on type 1, 80% recall on type 2 (40% net)
• Combined
– No overlap in responses
– 80% recall on type 1 (due to model A)
– 80% recall on type 2 (due to model B)
– 80% recall overall
© 2017 MapR Technologies 44
“New and Improved”
• Suppose model A is “improved”
– Before: 80% recall on type 1, 0% recall on type 2 (40% net)
– After: 40% recall on type 1, 100% also on type 2 (70% net)
• Combined after change
– Huge overlap in responses
– 40% recall on type 1 (due to model A)
– 100% recall on type 2 (due to model A)
– Model B has no effect
– 70% recall overall
© 2017 MapR Technologies 45
Coupling Paradox
© 2017 MapR Technologies 46
Is There Any Hope?
• This kind of problem is HARD
– Do your competitor’s and your own marketing model couple?
• Where possible, use ensembles instead of cascades
– Not as simple as it sounds
• Where possible, deploy composite models as units
– Not as simple as it sounds
• Always measure everything!
© 2017 MapR Technologies 47
How to Do Better
• Data + the right question + domain knowledge matter!
• Prioritize – put serious effort into infrastructure
– DataOps requires more than just data science
• Persist – use streams to keep data around
• Measure – everything, and record it
• Meta-analyze – understand and see what is happening
• Containerize – make deployment repeatable, easy
• Oh… don’t forget to do some machine learning, too
© 2017 MapR Technologies 48
Additional Resources
O’Reilly report by Ted Dunning & Ellen Friedman © March 2017
Read free courtesy of MapR:
https://mapr.com/geo-distribution-big-data-and-analytics/
O’Reilly book by Ted Dunning & Ellen Friedman
© March 2016
Read free courtesy of MapR:
https://mapr.com/streaming-architecture-using-
apache-kafka-mapr-streams/
© 2017 MapR Technologies 49
Additional Resources
O’Reilly book by Ted Dunning & Ellen Friedman
© June 2014
Read free courtesy of MapR:
https://mapr.com/practical-machine-learning-
new-look-anomaly-detection/
O’Reilly book by Ellen Friedman & Ted Dunning
© February 2014
Read free courtesy of MapR:
https://mapr.com/practical-machine-learning/
© 2017 MapR Technologies 50
Additional Resources
by Ellen Friedman 8 Aug 2017 on MapR blog:
https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/
by Ted Dunning 13 Sept 2017 in
InfoWorld:
https://www.infoworld.com/article/3223
688/machine-learning/machine-
learning-skills-for-software-
engineers.html
© 2017 MapR Technologies 51
New book: Machine Learning Logistics
Model Management in the Real World
O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017
Pre-register for a free pdf copy of book when it becomes available 26th
September, courtesy of MapR
http://info.mapr.com/2017_Content_Machine-Learning-
Logistics_eBook_Prereg_RegistrationPage.html
Going to Strata Data NYC? Book will be released 26 Sept 2017:
Visit MapR booth for free book signings or to talk about logistics
© 2017 MapR Technologies 52
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
© 2017 MapR Technologies 53
Q&A
@mapr
tdunning@mapr.com
ENGAGE WITH US
@ Ted_Dunning

Contenu connexe

Tendances

Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the MoviesDataWorks Summit
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBCarol McDonald
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 

Tendances (20)

Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 

Similaire à Machine Learning logistics

Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricMatt Stubbs
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Matt Stubbs
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningCarol McDonald
 
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell YouBig Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell YouMatt Stubbs
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Matt Stubbs
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 

Similaire à Machine Learning logistics (20)

Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data Fabric
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell YouBig Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 

Plus de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Plus de Ted Dunning (12)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Dernier

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 

Dernier (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 

Machine Learning logistics

  • 1. © 2017 MapR Technologies 1 Machine Learning Model Management
  • 2. © 2017 MapR Technologies 2 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Committer, PMC member, board member, ASF O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  • 3. © 2017 MapR Technologies 3 Machine Learning Everywhere Image courtesy Mtell used with permission.Images © Ellen Friedman.
  • 4. © 2017 MapR Technologies 4 Traditional View
  • 5. © 2017 MapR Technologies 5 Traditional View: This isn’t the whole story
  • 6. © 2017 MapR Technologies 6 90% of the effort in successful machine learning isn’t in the training or model dev… It’s the logistics
  • 7. © 2017 MapR Technologies 7 Why? • Just getting the training data is hard – Which data? How to make it accessible? Multiple sources! – New kinds of observations force restarts – Requires a ton of domain knowledge • The myth of the unitary model – You can’t train just one – You will have dozens of models, likely hundreds or more – Handoff to new versions is tricky – You have to get run-time to be sure about which is better 
  • 8. © 2017 MapR Technologies 8 What Machine Learning Tool is Best? • Most successful groups keep several “favorite” machine learning tools at hand – No single tool is best in every situation • The most important tool is a platform that supports logistics well – Don’t have to do everything at the application level – Lots of what matters can be handled at the platform level • A good design for the logistics can make a big difference
  • 9. © 2017 MapR Technologies 9 Some Gotchas • Ops-oriented people will not “get it” regarding modeling subtleties • Data scientists will not “get it” regarding operational realities • Therefore, modelers have to deliver self-contained models • And, ops has to provide pre-wired structure
  • 10. © 2017 MapR Technologies 10 Rendezvous Architecture Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  • 11. © 2017 MapR Technologies 11 Rendezvous to the Rescue: Better ML Logistics • Stream-1st architecture is a powerful approach with surprisingly widespread advantages – Innovative technologies emerging to for streaming data • Microservices approach provides flexibility – Streaming supports microservices (if done right) • Containers remove surprises – Predictable environment for running models
  • 12. © 2017 MapR Technologies 12 Rendezvous: Mainly for Decisioning Engines • Decisioning models – Looking for a “right answer” – Simpler than reinforcement learning • Examples include: – Fraud detection – Predictive analytics / market prediction – Churn prediction (as in telecommunications) – Yield optimization – Deep learning in form of speech or image recognition, in some cases
  • 13. © 2017 MapR Technologies 13 Why Stream? Munich surfing wave Image © 2017 Ellen Friedman
  • 14. © 2017 MapR Technologies 14 Stream-1st Architecture: Basis for MicroServices Stream instead of database as the shared “truth” POS 1..n Fraud detector Last card use Updater Card analytics Other card activity Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
  • 15. © 2017 MapR Technologies 15 Streaming Isolates Services stream Data source Consumer
  • 16. © 2017 MapR Technologies 16 With MapR, Geo-Distributed Data Appears Local stream stream Data source Consumer
  • 17. © 2017 MapR Technologies 17 With MapR, Geo-distributed Data Appears Local stream stream Data source ConsumerGlobal Data Center Regional Data Center
  • 18. © 2017 MapR Technologies 18 Features of Good Streaming • It is Persistent – Messages stick around for other consumers – Consumers don’t affect producers – Consumer doesn’t have to be online when message arrives • It is Performant – You don’t have to worry if a stream can keep up • It is Pervasive – It is there whenever you need it, no need to deploy anything – How much work is it to create a new file? Why harder for a stream?
  • 19. © 2017 MapR Technologies 19 Stream transport supports microservices
  • 20. © 2017 MapR Technologies 20 But we talked about decision engines?!?
  • 21. © 2017 MapR Technologies 21 What We Ultimately Want request response Model
  • 22. © 2017 MapR Technologies 22 But This Isn’t The Answer Model 1 request response Load balancer Model 2 Model 3
  • 23. © 2017 MapR Technologies 23 First Try with Streams Input Model 1 Model 2 Model 3 request response ?
  • 24. © 2017 MapR Technologies 24 First Rendezvous Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  • 25. © 2017 MapR Technologies 25 Some Key Points • Note that all models see identical inputs • All models run in production setting • All models send scores to same stream • The rendezvous server decides which scores to ignore • Roll forward, roll back, correlated comparison are all now trivial
  • 26. © 2017 MapR Technologies 26 Reality Check, Injecting External State Model 1 Model 2 Model 3 request Raw Add external data Input Database The world
  • 27. © 2017 MapR Technologies 27 Recording Raw Data (as it really was) Input Scores Decoy Model 2 Model 3 Archive
  • 28. © 2017 MapR Technologies 28 Quality & Reproducibility of Input Data is Important! • Recording raw-ish data is really a big deal – Data as seen by a model is worth gold – Data reconstructed later often has time-machine leaks – Databases were made for updates, streams are safer • Raw data is useful for non-ML cases as well (think flexibility) • Decoy model records training data as seen by models under development & evaluation
  • 29. © 2017 MapR Technologies 29 Canary for Comparison Real model ∆ Result Canary Decoy Archive Input
  • 30. © 2017 MapR Technologies 30 What Does the Canary Do? • The canary is a real model, but is very rarely updated • The canary results are almost never used for decisioning • The virtue of the canary is stability • Comparing to the canary results gives insight into new models
  • 31. © 2017 MapR Technologies 31 Isolated Development With Stream Replication Model 1 Model 2 Model 3 request Raw Add external data Input Internal 1 Internal 2 Internal 3 The world Model 4 Raw New external data Input Internal 4 Production Development
  • 32. © 2017 MapR Technologies 32 Scores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  • 33. © 2017 MapR Technologies 33 ResultsRendezvousScores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  • 34. © 2017 MapR Technologies 34 Metrics Metrics ResultsRendezvousScores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  • 35. © 2017 MapR Technologies 35 Models in production live in the real world: Conditions may (will) change
  • 36. © 2017 MapR Technologies 36 Not Such Bad Ideas • Keep models running “in the wings” – Don’t wait until conditions change to start building the next model – Keep new short-history models ready to roll, some graybeards as well • Hot hand-off – With rendezvous: just stop ignoring the new best model • Deploy a canary server – Keep an old model active as a reference – If it was 90% correct, difference with any better model should be small – Score distribution should be roughly constant
  • 37. © 2017 MapR Technologies 37 Correlated Comparison of Score Quantiles
  • 38. © 2017 MapR Technologies 38 Sample Model Cascade A B Fraud Fraud Clean Clean Fraud Assume that finding more frauds is all we care to do
  • 39. © 2017 MapR Technologies 39 Some Data
  • 40. © 2017 MapR Technologies 40 Consisting of Type 1
  • 41. © 2017 MapR Technologies 41 And Type 2
  • 42. © 2017 MapR Technologies 42 Sample Model Cascade A B Fraud Fraud Clean Clean Fraud Good with type 1 Good with type 2
  • 43. © 2017 MapR Technologies 43 Baseline Conditions • Model A – 80% recall on type 1, 0% recall on type 2 (40% net) • Model B – 0% recall on type 1, 80% recall on type 2 (40% net) • Combined – No overlap in responses – 80% recall on type 1 (due to model A) – 80% recall on type 2 (due to model B) – 80% recall overall
  • 44. © 2017 MapR Technologies 44 “New and Improved” • Suppose model A is “improved” – Before: 80% recall on type 1, 0% recall on type 2 (40% net) – After: 40% recall on type 1, 100% also on type 2 (70% net) • Combined after change – Huge overlap in responses – 40% recall on type 1 (due to model A) – 100% recall on type 2 (due to model A) – Model B has no effect – 70% recall overall
  • 45. © 2017 MapR Technologies 45 Coupling Paradox
  • 46. © 2017 MapR Technologies 46 Is There Any Hope? • This kind of problem is HARD – Do your competitor’s and your own marketing model couple? • Where possible, use ensembles instead of cascades – Not as simple as it sounds • Where possible, deploy composite models as units – Not as simple as it sounds • Always measure everything!
  • 47. © 2017 MapR Technologies 47 How to Do Better • Data + the right question + domain knowledge matter! • Prioritize – put serious effort into infrastructure – DataOps requires more than just data science • Persist – use streams to keep data around • Measure – everything, and record it • Meta-analyze – understand and see what is happening • Containerize – make deployment repeatable, easy • Oh… don’t forget to do some machine learning, too
  • 48. © 2017 MapR Technologies 48 Additional Resources O’Reilly report by Ted Dunning & Ellen Friedman © March 2017 Read free courtesy of MapR: https://mapr.com/geo-distribution-big-data-and-analytics/ O’Reilly book by Ted Dunning & Ellen Friedman © March 2016 Read free courtesy of MapR: https://mapr.com/streaming-architecture-using- apache-kafka-mapr-streams/
  • 49. © 2017 MapR Technologies 49 Additional Resources O’Reilly book by Ted Dunning & Ellen Friedman © June 2014 Read free courtesy of MapR: https://mapr.com/practical-machine-learning- new-look-anomaly-detection/ O’Reilly book by Ellen Friedman & Ted Dunning © February 2014 Read free courtesy of MapR: https://mapr.com/practical-machine-learning/
  • 50. © 2017 MapR Technologies 50 Additional Resources by Ellen Friedman 8 Aug 2017 on MapR blog: https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/ by Ted Dunning 13 Sept 2017 in InfoWorld: https://www.infoworld.com/article/3223 688/machine-learning/machine- learning-skills-for-software- engineers.html
  • 51. © 2017 MapR Technologies 51 New book: Machine Learning Logistics Model Management in the Real World O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017 Pre-register for a free pdf copy of book when it becomes available 26th September, courtesy of MapR http://info.mapr.com/2017_Content_Machine-Learning- Logistics_eBook_Prereg_RegistrationPage.html Going to Strata Data NYC? Book will be released 26 Sept 2017: Visit MapR booth for free book signings or to talk about logistics
  • 52. © 2017 MapR Technologies 52 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  • 53. © 2017 MapR Technologies 53 Q&A @mapr tdunning@mapr.com ENGAGE WITH US @ Ted_Dunning