SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
© 2018 Ellen Friedman 11
Beyond the Algorithm: What Makes
Machine Learning Work?
Ellen Friedman, PhD
11 June 2018
Berlin Buzzwords #bbuzz
© 2018 Ellen Friedman 22
Contact Information
Ellen Friedman, PhD
Principal Technologist, MapR Technologies
Committer Apache Drill & Apache Mahout projects
O’Reilly author
Email efriedman@mapr.com ellenf@apache.org
Twitter @Ellen_Friedman #bbuzz
© 2018 Ellen Friedman 33
What makes machine learning
work?
© 2018 Ellen Friedman 44
+ = ?
© 2018 Ellen Friedman 55
Data Engineer Ian Downard
Had never tried machine learning, but he caught the bug…
© 2018 Ellen Friedman 66
Image from Wikipedia & used under Creative
Commons https://en.wikipedia.org/wiki/
File:Rhode_Island_Red_cock,_cropped.jpg
Image Recognition: Which Bird Is This?
Image from Wikipedia & used under Creative
Commons https://upload.wikimedia.org/wikipedia/
commons/7/74/
Barred_Plymouth_Rock_Rooster_001.jpg
Image from Wikipedia & used under Creative
Commons https://en.wikipedia.org/wiki/
Aphelocoma#/media/
File:WesternScrubJay2.jpg
Rhode Island Red Buff Orpington Jay
© 2018 Ellen Friedman 77
Image from Wikipedia & used under Creative
Commons https://en.wikipedia.org/wiki/
File:Rhode_Island_Red_cock,_cropped.jpg
More to the point…
Image from Wikipedia & used under Creative
Commons https://upload.wikimedia.org/wikipedia/
commons/7/74/
Barred_Plymouth_Rock_Rooster_001.jpg
Image from Wikipedia & used under Creative
Commons https://en.wikipedia.org/wiki/
Aphelocoma#/media/
File:WesternScrubJay2.jpg
Chicken Chicken Not a Chicken
© 2018 Ellen Friedman 88
Image from Wikipedia & used under Creative
Commons https://en.wikipedia.org/wiki/
File:Rhode_Island_Red_cock,_cropped.jpg
Domain Knowledge Matters
Image from Wikipedia & used under Creative
Commons https://upload.wikimedia.org/wikipedia/
commons/7/74/
Barred_Plymouth_Rock_Rooster_001.jpg
Image from Wikipedia & used under Creative
Commons https://en.wikipedia.org/wiki/
Aphelocoma#/media/
File:WesternScrubJay2.jpg
Chicken Chicken Predator
© 2018 Ellen Friedman 99
Tensor Chicken
Label
training
data
Run the
model
Deploy
model
Gather
training
data
Labeled
image files
Train
model
Update
model
Deep learning project using
Inception v3 model from
TensorFlow (see blog +
@tensorchicken)
© 2018 Ellen Friedman 1010
Value from ML: what about SLA’s?
How long does it take for image recognition model to classify
image?
•  ~30 seconds because just running on a Raspberry Pi
How long does it take for Scrub Jay to peck an egg?
•  < 30 seconds
•  Oops…
© 2018 Ellen Friedman 1111
Value from ML: what about action?
When image classification indicates a jay in the henhouse, what
would you do to chase away the predator?
•  Not sure yet. That’s a problem…
© 2018 Ellen Friedman 1212
What makes a difference for impact?
Image from Wikipedia & used under Creative
Commons https://upload.wikimedia.org/wikipedia/
commons/7/74/
Barred_Plymouth_Rock_Rooster_001.jpg
Left: labelled as Buff Orpington.
Right: This what a Buff Orpington really looks like.
Image from Wikipedia used under Creative Commons
https://en.wikipedia.org/wiki/Orpington_chicken#/media/
File:Coq_orpington_fauve.JPG
But…this error in domain
knowledge (wrong name) did
not matter for SLAs.
© 2018 Ellen Friedman 1313
What lessons can we learn from this toy project?
•  Image recognition & deep learning are cool
•  In some cases, building or training a model is simple
•  Domain knowledge matters (really)
•  Pay attention to SLAs
•  For real business value, have a plan of action in response to machine
learning insights (producing a report ≠ taking an action)
•  Software engineers have a role in machine learning (!)
© 2018 Ellen Friedman 1414
What about real world
examples?
© 2018 Ellen Friedman 1515
Domain Knowledge Matters: Video Recommender
•  Use clicks as input data: recommender gives poor performance
–  Model is testing the wrong preferences: how well people liked titles
•  Use first 30 seconds of viewing as input data: recommender
performance is good
–  Model now tests how well people liked the videos, not just the titles
© 2018 Ellen Friedman 1616
Domain Knowledge Matters: Detecting Security Attacks
Security expert at a bank preserved headers for web site
requests
© 2018 Ellen Friedman 1717
Spot the Important Difference?
GET	/personal/comparison-table.jsp?iODg2OQ=51a90	HTTP/1.1
Host:	www.sometarget.com
User-Agent:	Mozilla/4.0	(compatible;	MSIE	6.0;	Windows	NT	5.1;)
Accept-Encoding:	deflate
Accept-Charset:	UTF-8
Accept-Language:	fr
Cache-Control:	no-cache
Pragma:	no-cache
Connection:	Keep-Alive
GET	/photo.jpg	HTTP/1.1
Host:	lh4.googleusercontent.com
User-Agent:	Mozilla/5.0	(Macinto
Accept:	image/png,image/*;q=0.8,
Accept-Language:	en-US,en;q=0.5
Accept-Encoding:	gzip,	deflate,	
Referer:	https://www.google.com
Connection:	keep-alive
If-None-Match:	"v9”
Cache-Control:	max-age=0
Attacker request Real request
© 2018 Ellen Friedman 1818
Spot the Important Difference?
GET	/personal/comparison-table.jsp?iODg2OQ=51a90	HTTP/1.1
Host:	www.sometarget.com
User-Agent:	Mozilla/4.0	(compatible;	MSIE	6.0;	Windows	NT	5.1;)
Accept-Encoding:	deflate
Accept-Charset:	UTF-8
Accept-Language:	fr
Cache-Control:	no-cache
Pragma:	no-cache
Connection:	Keep-Alive
GET	/photo.jpg	HTTP/1.1
Host:	lh4.googleusercontent.com
User-Agent:	Mozilla/5.0	(Macinto
Accept:	image/png,image/*;q=0.8,
Accept-Language:	en-US,en;q=0.5
Accept-Encoding:	gzip,	deflate,	
Referer:	https://www.google.com
Connection:	keep-alive
If-None-Match:	"v9”
Cache-Control:	max-age=0
Attacker request Real request
© 2018 Ellen Friedman 1919
Another Example
GET	photo.jpg	HTTP/1.1
Host:	lh4.googleusercontent
User-agent:	Mozilla/5.0	(Ma
Accept:	image/png,image/*
Accept-language:	en-US,en
Accept-encoding:	gzip,	defl
Referer:	https://www.google
Connection:	keep-alive
If-none-match:	"v9”
Cache-control:	max-age=0
GET	cc/borken.json	HTTP/1.1
host:	c.qrs.my
user-agent:	Mozilla/4.0	(co
accept:	application/json,	t
accept-language:	en-US,en
accept-encoding:	gzip,	defl
referer:	none
connection:	keep-alive
if-none-match:	"v9”
cache-control:	max-age=0
Attacker requestReal request
© 2018 Ellen Friedman 2020
Another Example
GET	photo.jpg	HTTP/1.1
Host:	lh4.googleusercontent
User-agent:	Mozilla/5.0	(Ma
Accept:	image/png,image/*
Accept-language:	en-US,en
Accept-encoding:	gzip,	defl
Referer:	https://www.google
Connection:	keep-alive
If-none-match:	"v9”
Cache-control:	max-age=0
GET	cc/borken.json	HTTP/1.1
host:	c.qrs.my
user-agent:	Mozilla/4.0	(co
accept:	application/json,	t
accept-language:	en-US,en
accept-encoding:	gzip,	defl
referer:	none
connection:	keep-alive
if-none-match:	"v9”
cache-control:	max-age=0
Attacker requestReal request
© 2018 Ellen Friedman 2121
Domain Knowledge Matters: Detecting Security Attacks
Security expert at a bank preserved headers for web site
requests
•  Detected anomaly in headers for the attackers vs normal (real)
requests
•  Pattern of behavior for attackers was allowable for headers
and it was not predictable: but it was different
© 2018 Ellen Friedman 2222
Keep data: 

You don’t know what you’ll
need to know later
© 2018 Ellen Friedman 2323
All other mages © E. Friedman
.
Big Industry, Big Data, Big Value
©WesAbrams
Image courtesy Mtell used with permission
© 2018 Ellen Friedman 2424
Simple But Valuable: Accounting Audit Targeting
Big industrial company with a lot of machinery
•  Tracking actions & parts
–  label as repairs, contracted services, delivery of supplies, etc.
•  Which are to be taxed, expensed or counted as revenues
–  Mislabelling can cost millions of dollars
•  Use machine learning to target potential mislabelling for audit
review
•  Relatively simple models (pattern matching; exception
detection) deliver a huge business value.
© 2018 Ellen Friedman 2525
Is it the algorithm? the model? the ML tool?
https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/
© 2018 Ellen Friedman 2626
90% of the effort in successful
machine learning isn’t the
algorithm or the model…
It’s the logistics
© 2018 Ellen Friedman 2727
What Does Streaming Do for You?
Surfer on standing wave, Munich Image © 2017 Ellen Friedman
© 2018 Ellen Friedman 2828
Stream	
  transport	
  supports	
  
microservices	
  	
  	
  
© 2018 Ellen Friedman 2929
At the Heart: Message Transport
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
With the right messaging tool at
the heart of stream-1st
architecture you support other
classes of use cases (B & C)
© 2018 Ellen Friedman 3030
Stream Transport that Decouples Producers & Consumers
P
P
P
C
C
C
Transport Processing
Kafka /
MapR Streams
Good stream transport is persistent, performant & pervasive!
© 2018 Ellen Friedman 3131
Streaming Microservices
•  “Streaming Microservices” by Ted Dunning & Ellen Friedman, chapter in
Encyclopedia of Big Data Technologies, Sherif Sakr and Albert Zomaya,
editors, © 2018 (Springer International Publishing)
•  Chapter 3 of Streaming Architecture by Ted Dunning & Ellen Friedman ©
2016 (O’Reilly Media)
https://mapr.com/ebooks/streaming-architecture/chapter-03-streaming-platform-for-
microservices.html
© 2018 Ellen Friedman 3232
Get rid of the myth of the
unitary model
© 2018 Ellen Friedman 3333
Streaming microservices provides
flexibility & independence to
manage many models
© 2018 Ellen Friedman 3434
Logistics for machine learning can be difficult
•  Just getting the training & input data is hard
•  Many models to manage
•  Model-to-model evaluation needs to be convenient & accurate
•  Respond as the world changes: Deploy to production with agile
roll out & roll
•  There’s a need for good data engineering (not just data science)
© 2018 Ellen Friedman 3535
ResultsRendezvous
Enter Rendezvous…
Scores
ArchiveDecoy
m1
m2
m3
Features /
profiles
InputRaw
Rendezvous Architecture described in:
-  Machine Learning Logistics book by Ted Dunning & Ellen Friedman © 2018 (O’Reilly)
-  “Rendezvous Architecture” by Ted Dunning & Ellen Friedman, chapter in Encyclopedia of Big Data
Technologies. Sherif Sakr and Albert Zomaya, editors. Springer International Publishing, © 2018 in press.
© 2018 Ellen Friedman 3636
ResultsRendezvous
Rendezvous server making continuous decisions
Scores
ArchiveDecoy
m1
m2
m3
Features /
profiles
InputRaw
© 2018 Ellen Friedman 3737
Machine Learning Logistics:
Model Management in the Real World
Free copy online courtesy of MapR:
https://mapr.com/ebook/machine-learning-
logistics/
By Ted Dunning & Ellen Friedman © September 2017
© 2018 Ellen Friedman 3838
Decoy Model in the Rendezvous Architecture
Input
Scores
Decoy
Model 2
Model 3
Archive
•  Looks like a model, but it just archives inputs
•  Safe in a good streaming environment, less safe without good isolation
© 2018 Ellen Friedman 3939
Canary Model Helps Gauge Performance of New Models
Input
Scores
Decoy
Model 2
Model 3
Archive
•  Acts as a baseline reference for reasonable performance
•  Can be compared to performance of newly deployed models
Real
model
Result
Canary
Decoy
Archive
Input
© 2018 Ellen Friedman 4040
To	
  deploy	
  a	
  new	
  model,	
  just	
  
stop	
  ignoring	
  it	
  
© 2018 Ellen Friedman 4141
Rendezvous: Mainly for Decisioning Type Systems
•  Decisioning style machine learning
–  Looking for “right answer”
–  Simpler than interactive machine learning (think self-driving car)
•  Examples: fraud detection, predictive analytics; churn prediction
(telecom); deep learning (speech or image recognition)
© 2018 Ellen Friedman 4242
Update on Machine Learning Logistics
Rendezvous Architecture
–  Implemented by Terry McCann (Adatis) and Boris Lublinsky (Lightbend)
–  Soon to be put into a product by Finnish company Valonai
–  Large US financial company: data science team starting to plan how to
build rendezvous servers for each major project
–  “Rendezvous Architecture” is an entry in Springer’s Encyclopedia of Big
Data Technologies (© 2018)
© 2018 Ellen Friedman 4343
Update on Machine Learning Logistics
•  MLFlow
–  From DataBricks to focus on early part of logistics
https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-
learning-platform.html
•  Clipper
–  From UC Berkeley Rise Lab; has deployment of models; some similar
goals to Rendezvous
https://rise.cs.berkeley.edu/projects/clipper/
© 2018 Ellen Friedman 4444
Be ready to adapt:
Conditions will change
© 2018 Ellen Friedman 4545
DataOps: Brings Flexibility & Focus
Platform&network
Operations
Softwareengineering
Architecture&planning
Dataengineering
Datascience
Productmanagement
DataOps Team B
DataOps Team A
Cross functional DataOps teams
•  You don’t have to be a data scientist to contribute to machine learning
•  Software engineer/ developer plays a role: but you need good data skills
© 2018 Ellen Friedman 4646
DataOps Principles
“DataOps teams seek to orchestrate data, tools, code and environments
from beginning to end.”
Thor Olavsrud interview with Ted Dunning & Ellen Friedman for CIO
https://www.cio.com/article/3237694/analytics/what-is-dataops-data-operations-analytics.html
by Ted Dunning 13 Sept 2017 in “InfoWorld”
https://www.infoworld.com/article/3223688/machine-
learning/machine-learning-skills-for-software-engineers.html
© 2018 Ellen Friedman 4747
Advantages of a DataOps Approach
•  Able to pivot & respond to real-world events as they happen
•  Improved efficiency and better use of people’s time
•  Faster time-to-value
•  A good fit to working with a global data fabric
© 2018 Ellen Friedman 4848
Machine Learning Logistics:
Model Management in the Real World
Free copy online courtesy of MapR:
https://mapr.com/ebook/machine-learning-
logistics/
By Ted Dunning & Ellen Friedman © September 2017
© 2018 Ellen Friedman 4949
Streaming Architecture:
New Designs Using Apache Kafka & MapR Streams
Free copy online courtesy of MapR:
http://bit.ly/mapr-streaming-architecture-book
By Ted Dunning & Ellen Friedman © March 2017
© 2018 Ellen Friedman 5050
Introduction to Apache Flink
Free copy online courtesy of MapR:
https://mapr.com/ebooks/intro-to-apache-
flink/
By Ellen Friedman & Kostas Tzoumas © September 2016
Ellen Friedman
& Kostas Tzoumas
Introduction
toApacheFlink
Stream Processing for
Real Time and Beyond
New ebook by
Ellen Friedman and
Kostas Tzoumas
In this book you’ll learn:
· What Apache Flink can do
· How it maintains consistency and provides flexibility
· How people are using it, including in production
· Best practices for streaming architectures
Download your copy:
mapr.com/flink-book
© 2018 Ellen Friedman 5151
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
© 2018 Ellen Friedman 5252
Thank you !
© 2018 Ellen Friedman 5353
Contact Information
Ellen Friedman, PhD
Principal Technologist, MapR Technologies
Committer Apache Drill & Apache Mahout projects
O’Reilly author
Email efriedman@mapr.com ellenf@apache.org
Twitter @Ellen_Friedman #bbuzz

Contenu connexe

Tendances

Tim Daines, QuantumBlack
Tim Daines, QuantumBlackTim Daines, QuantumBlack
Tim Daines, QuantumBlackMad*Pow
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018TigerGraph
 
Jan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceJan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceLviv Startup Club
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
 
Self Guiding User Experience
Self Guiding User ExperienceSelf Guiding User Experience
Self Guiding User ExperienceSri Ambati
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopSkillspeed
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseTalend
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Talend
 
No more grid search! How to build models effectively by Thomas Huijskens
No more grid search! How to build models effectively by Thomas HuijskensNo more grid search! How to build models effectively by Thomas Huijskens
No more grid search! How to build models effectively by Thomas HuijskensSri Ambati
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gasDiego Escobar
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamSri Ambati
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 

Tendances (20)

Tim Daines, QuantumBlack
Tim Daines, QuantumBlackTim Daines, QuantumBlack
Tim Daines, QuantumBlack
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
 
Jan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceJan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practice
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
Self Guiding User Experience
Self Guiding User ExperienceSelf Guiding User Experience
Self Guiding User Experience
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
Smart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat TranSmart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat Tran
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
 
No more grid search! How to build models effectively by Thomas Huijskens
No more grid search! How to build models effectively by Thomas HuijskensNo more grid search! How to build models effectively by Thomas Huijskens
No more grid search! How to build models effectively by Thomas Huijskens
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gas
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData Amsterdam
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 

Similaire à What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk

Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Amazon Web Services
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep LearningAnimesh Singh
 
Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...
Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...
Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...Amanda Morris
 
Graph Gurus Episode 22: Cybersecurity
Graph Gurus Episode 22: CybersecurityGraph Gurus Episode 22: Cybersecurity
Graph Gurus Episode 22: CybersecurityTigerGraph
 
Federated Machine Learning Framework
Federated Machine Learning FrameworkFederated Machine Learning Framework
Federated Machine Learning FrameworkAnup kumar
 
Debunking Myths About Cloud Portability
Debunking Myths About Cloud PortabilityDebunking Myths About Cloud Portability
Debunking Myths About Cloud PortabilityVMware Tanzu
 
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec
 
Scaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analyticsScaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analyticsConnected Data World
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era Steven Miller
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Matt Stubbs
 
Part1: Introduction to Project Management
Part1: Introduction to Project ManagementPart1: Introduction to Project Management
Part1: Introduction to Project ManagementArry Arman
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...Kim Hammar
 
THE ESSENTIAL ELEMENT OF YOUR SECURITY
THE ESSENTIAL  ELEMENT OF YOUR SECURITYTHE ESSENTIAL  ELEMENT OF YOUR SECURITY
THE ESSENTIAL ELEMENT OF YOUR SECURITYETDAofficialRegist
 
Code to Cloud Workshop
Code to Cloud WorkshopCode to Cloud Workshop
Code to Cloud WorkshopJamie Coleman
 
Big Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy FranklinBig Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy FranklinSridhar Karnam
 
Code to Cloud Workshop.pptx
Code to Cloud Workshop.pptxCode to Cloud Workshop.pptx
Code to Cloud Workshop.pptxJamie Coleman
 
Code to Cloud Workshop, Shifting Security to the Left
Code to Cloud Workshop, Shifting Security to the LeftCode to Cloud Workshop, Shifting Security to the Left
Code to Cloud Workshop, Shifting Security to the LeftJamie Coleman
 
Top 10 tips for effective SOC/NOC collaboration or integration
Top 10 tips for effective SOC/NOC collaboration or integrationTop 10 tips for effective SOC/NOC collaboration or integration
Top 10 tips for effective SOC/NOC collaboration or integrationSridhar Karnam
 
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...Flink Forward
 

Similaire à What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk (20)

Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep Learning
 
Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...
Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...
Graph Gurus Episode 22: Guarding Against Cyber Security Threats with a Graph ...
 
Graph Gurus Episode 22: Cybersecurity
Graph Gurus Episode 22: CybersecurityGraph Gurus Episode 22: Cybersecurity
Graph Gurus Episode 22: Cybersecurity
 
Federated Machine Learning Framework
Federated Machine Learning FrameworkFederated Machine Learning Framework
Federated Machine Learning Framework
 
Debunking Myths About Cloud Portability
Debunking Myths About Cloud PortabilityDebunking Myths About Cloud Portability
Debunking Myths About Cloud Portability
 
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
 
Scaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analyticsScaling up business value with real-time operational graph analytics
Scaling up business value with real-time operational graph analytics
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
 
Part1: Introduction to Project Management
Part1: Introduction to Project ManagementPart1: Introduction to Project Management
Part1: Introduction to Project Management
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
THE ESSENTIAL ELEMENT OF YOUR SECURITY
THE ESSENTIAL  ELEMENT OF YOUR SECURITYTHE ESSENTIAL  ELEMENT OF YOUR SECURITY
THE ESSENTIAL ELEMENT OF YOUR SECURITY
 
Code to Cloud Workshop
Code to Cloud WorkshopCode to Cloud Workshop
Code to Cloud Workshop
 
Big Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy FranklinBig Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy Franklin
 
Code to Cloud Workshop.pptx
Code to Cloud Workshop.pptxCode to Cloud Workshop.pptx
Code to Cloud Workshop.pptx
 
Code to Cloud Workshop, Shifting Security to the Left
Code to Cloud Workshop, Shifting Security to the LeftCode to Cloud Workshop, Shifting Security to the Left
Code to Cloud Workshop, Shifting Security to the Left
 
Top 10 tips for effective SOC/NOC collaboration or integration
Top 10 tips for effective SOC/NOC collaboration or integrationTop 10 tips for effective SOC/NOC collaboration or integration
Top 10 tips for effective SOC/NOC collaboration or integration
 
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
 

Dernier

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 

Dernier (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 

What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk

  • 1. © 2018 Ellen Friedman 11 Beyond the Algorithm: What Makes Machine Learning Work? Ellen Friedman, PhD 11 June 2018 Berlin Buzzwords #bbuzz
  • 2. © 2018 Ellen Friedman 22 Contact Information Ellen Friedman, PhD Principal Technologist, MapR Technologies Committer Apache Drill & Apache Mahout projects O’Reilly author Email efriedman@mapr.com ellenf@apache.org Twitter @Ellen_Friedman #bbuzz
  • 3. © 2018 Ellen Friedman 33 What makes machine learning work?
  • 4. © 2018 Ellen Friedman 44 + = ?
  • 5. © 2018 Ellen Friedman 55 Data Engineer Ian Downard Had never tried machine learning, but he caught the bug…
  • 6. © 2018 Ellen Friedman 66 Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ File:Rhode_Island_Red_cock,_cropped.jpg Image Recognition: Which Bird Is This? Image from Wikipedia & used under Creative Commons https://upload.wikimedia.org/wikipedia/ commons/7/74/ Barred_Plymouth_Rock_Rooster_001.jpg Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ Aphelocoma#/media/ File:WesternScrubJay2.jpg Rhode Island Red Buff Orpington Jay
  • 7. © 2018 Ellen Friedman 77 Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ File:Rhode_Island_Red_cock,_cropped.jpg More to the point… Image from Wikipedia & used under Creative Commons https://upload.wikimedia.org/wikipedia/ commons/7/74/ Barred_Plymouth_Rock_Rooster_001.jpg Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ Aphelocoma#/media/ File:WesternScrubJay2.jpg Chicken Chicken Not a Chicken
  • 8. © 2018 Ellen Friedman 88 Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ File:Rhode_Island_Red_cock,_cropped.jpg Domain Knowledge Matters Image from Wikipedia & used under Creative Commons https://upload.wikimedia.org/wikipedia/ commons/7/74/ Barred_Plymouth_Rock_Rooster_001.jpg Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ Aphelocoma#/media/ File:WesternScrubJay2.jpg Chicken Chicken Predator
  • 9. © 2018 Ellen Friedman 99 Tensor Chicken Label training data Run the model Deploy model Gather training data Labeled image files Train model Update model Deep learning project using Inception v3 model from TensorFlow (see blog + @tensorchicken)
  • 10. © 2018 Ellen Friedman 1010 Value from ML: what about SLA’s? How long does it take for image recognition model to classify image? •  ~30 seconds because just running on a Raspberry Pi How long does it take for Scrub Jay to peck an egg? •  < 30 seconds •  Oops…
  • 11. © 2018 Ellen Friedman 1111 Value from ML: what about action? When image classification indicates a jay in the henhouse, what would you do to chase away the predator? •  Not sure yet. That’s a problem…
  • 12. © 2018 Ellen Friedman 1212 What makes a difference for impact? Image from Wikipedia & used under Creative Commons https://upload.wikimedia.org/wikipedia/ commons/7/74/ Barred_Plymouth_Rock_Rooster_001.jpg Left: labelled as Buff Orpington. Right: This what a Buff Orpington really looks like. Image from Wikipedia used under Creative Commons https://en.wikipedia.org/wiki/Orpington_chicken#/media/ File:Coq_orpington_fauve.JPG But…this error in domain knowledge (wrong name) did not matter for SLAs.
  • 13. © 2018 Ellen Friedman 1313 What lessons can we learn from this toy project? •  Image recognition & deep learning are cool •  In some cases, building or training a model is simple •  Domain knowledge matters (really) •  Pay attention to SLAs •  For real business value, have a plan of action in response to machine learning insights (producing a report ≠ taking an action) •  Software engineers have a role in machine learning (!)
  • 14. © 2018 Ellen Friedman 1414 What about real world examples?
  • 15. © 2018 Ellen Friedman 1515 Domain Knowledge Matters: Video Recommender •  Use clicks as input data: recommender gives poor performance –  Model is testing the wrong preferences: how well people liked titles •  Use first 30 seconds of viewing as input data: recommender performance is good –  Model now tests how well people liked the videos, not just the titles
  • 16. © 2018 Ellen Friedman 1616 Domain Knowledge Matters: Detecting Security Attacks Security expert at a bank preserved headers for web site requests
  • 17. © 2018 Ellen Friedman 1717 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1 Host: www.sometarget.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;) Accept-Encoding: deflate Accept-Charset: UTF-8 Accept-Language: fr Cache-Control: no-cache Pragma: no-cache Connection: Keep-Alive GET /photo.jpg HTTP/1.1 Host: lh4.googleusercontent.com User-Agent: Mozilla/5.0 (Macinto Accept: image/png,image/*;q=0.8, Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, Referer: https://www.google.com Connection: keep-alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request
  • 18. © 2018 Ellen Friedman 1818 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1 Host: www.sometarget.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;) Accept-Encoding: deflate Accept-Charset: UTF-8 Accept-Language: fr Cache-Control: no-cache Pragma: no-cache Connection: Keep-Alive GET /photo.jpg HTTP/1.1 Host: lh4.googleusercontent.com User-Agent: Mozilla/5.0 (Macinto Accept: image/png,image/*;q=0.8, Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, Referer: https://www.google.com Connection: keep-alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request
  • 19. © 2018 Ellen Friedman 1919 Another Example GET photo.jpg HTTP/1.1 Host: lh4.googleusercontent User-agent: Mozilla/5.0 (Ma Accept: image/png,image/* Accept-language: en-US,en Accept-encoding: gzip, defl Referer: https://www.google Connection: keep-alive If-none-match: "v9” Cache-control: max-age=0 GET cc/borken.json HTTP/1.1 host: c.qrs.my user-agent: Mozilla/4.0 (co accept: application/json, t accept-language: en-US,en accept-encoding: gzip, defl referer: none connection: keep-alive if-none-match: "v9” cache-control: max-age=0 Attacker requestReal request
  • 20. © 2018 Ellen Friedman 2020 Another Example GET photo.jpg HTTP/1.1 Host: lh4.googleusercontent User-agent: Mozilla/5.0 (Ma Accept: image/png,image/* Accept-language: en-US,en Accept-encoding: gzip, defl Referer: https://www.google Connection: keep-alive If-none-match: "v9” Cache-control: max-age=0 GET cc/borken.json HTTP/1.1 host: c.qrs.my user-agent: Mozilla/4.0 (co accept: application/json, t accept-language: en-US,en accept-encoding: gzip, defl referer: none connection: keep-alive if-none-match: "v9” cache-control: max-age=0 Attacker requestReal request
  • 21. © 2018 Ellen Friedman 2121 Domain Knowledge Matters: Detecting Security Attacks Security expert at a bank preserved headers for web site requests •  Detected anomaly in headers for the attackers vs normal (real) requests •  Pattern of behavior for attackers was allowable for headers and it was not predictable: but it was different
  • 22. © 2018 Ellen Friedman 2222 Keep data: 
 You don’t know what you’ll need to know later
  • 23. © 2018 Ellen Friedman 2323 All other mages © E. Friedman . Big Industry, Big Data, Big Value ©WesAbrams Image courtesy Mtell used with permission
  • 24. © 2018 Ellen Friedman 2424 Simple But Valuable: Accounting Audit Targeting Big industrial company with a lot of machinery •  Tracking actions & parts –  label as repairs, contracted services, delivery of supplies, etc. •  Which are to be taxed, expensed or counted as revenues –  Mislabelling can cost millions of dollars •  Use machine learning to target potential mislabelling for audit review •  Relatively simple models (pattern matching; exception detection) deliver a huge business value.
  • 25. © 2018 Ellen Friedman 2525 Is it the algorithm? the model? the ML tool? https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/
  • 26. © 2018 Ellen Friedman 2626 90% of the effort in successful machine learning isn’t the algorithm or the model… It’s the logistics
  • 27. © 2018 Ellen Friedman 2727 What Does Streaming Do for You? Surfer on standing wave, Munich Image © 2017 Ellen Friedman
  • 28. © 2018 Ellen Friedman 2828 Stream  transport  supports   microservices      
  • 29. © 2018 Ellen Friedman 2929 At the Heart: Message Transport Real-time analytics EMR Patient Facilities management Insurance audit A B Medical tests C Medical test results With the right messaging tool at the heart of stream-1st architecture you support other classes of use cases (B & C)
  • 30. © 2018 Ellen Friedman 3030 Stream Transport that Decouples Producers & Consumers P P P C C C Transport Processing Kafka / MapR Streams Good stream transport is persistent, performant & pervasive!
  • 31. © 2018 Ellen Friedman 3131 Streaming Microservices •  “Streaming Microservices” by Ted Dunning & Ellen Friedman, chapter in Encyclopedia of Big Data Technologies, Sherif Sakr and Albert Zomaya, editors, © 2018 (Springer International Publishing) •  Chapter 3 of Streaming Architecture by Ted Dunning & Ellen Friedman © 2016 (O’Reilly Media) https://mapr.com/ebooks/streaming-architecture/chapter-03-streaming-platform-for- microservices.html
  • 32. © 2018 Ellen Friedman 3232 Get rid of the myth of the unitary model
  • 33. © 2018 Ellen Friedman 3333 Streaming microservices provides flexibility & independence to manage many models
  • 34. © 2018 Ellen Friedman 3434 Logistics for machine learning can be difficult •  Just getting the training & input data is hard •  Many models to manage •  Model-to-model evaluation needs to be convenient & accurate •  Respond as the world changes: Deploy to production with agile roll out & roll •  There’s a need for good data engineering (not just data science)
  • 35. © 2018 Ellen Friedman 3535 ResultsRendezvous Enter Rendezvous… Scores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw Rendezvous Architecture described in: -  Machine Learning Logistics book by Ted Dunning & Ellen Friedman © 2018 (O’Reilly) -  “Rendezvous Architecture” by Ted Dunning & Ellen Friedman, chapter in Encyclopedia of Big Data Technologies. Sherif Sakr and Albert Zomaya, editors. Springer International Publishing, © 2018 in press.
  • 36. © 2018 Ellen Friedman 3636 ResultsRendezvous Rendezvous server making continuous decisions Scores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  • 37. © 2018 Ellen Friedman 3737 Machine Learning Logistics: Model Management in the Real World Free copy online courtesy of MapR: https://mapr.com/ebook/machine-learning- logistics/ By Ted Dunning & Ellen Friedman © September 2017
  • 38. © 2018 Ellen Friedman 3838 Decoy Model in the Rendezvous Architecture Input Scores Decoy Model 2 Model 3 Archive •  Looks like a model, but it just archives inputs •  Safe in a good streaming environment, less safe without good isolation
  • 39. © 2018 Ellen Friedman 3939 Canary Model Helps Gauge Performance of New Models Input Scores Decoy Model 2 Model 3 Archive •  Acts as a baseline reference for reasonable performance •  Can be compared to performance of newly deployed models Real model Result Canary Decoy Archive Input
  • 40. © 2018 Ellen Friedman 4040 To  deploy  a  new  model,  just   stop  ignoring  it  
  • 41. © 2018 Ellen Friedman 4141 Rendezvous: Mainly for Decisioning Type Systems •  Decisioning style machine learning –  Looking for “right answer” –  Simpler than interactive machine learning (think self-driving car) •  Examples: fraud detection, predictive analytics; churn prediction (telecom); deep learning (speech or image recognition)
  • 42. © 2018 Ellen Friedman 4242 Update on Machine Learning Logistics Rendezvous Architecture –  Implemented by Terry McCann (Adatis) and Boris Lublinsky (Lightbend) –  Soon to be put into a product by Finnish company Valonai –  Large US financial company: data science team starting to plan how to build rendezvous servers for each major project –  “Rendezvous Architecture” is an entry in Springer’s Encyclopedia of Big Data Technologies (© 2018)
  • 43. © 2018 Ellen Friedman 4343 Update on Machine Learning Logistics •  MLFlow –  From DataBricks to focus on early part of logistics https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine- learning-platform.html •  Clipper –  From UC Berkeley Rise Lab; has deployment of models; some similar goals to Rendezvous https://rise.cs.berkeley.edu/projects/clipper/
  • 44. © 2018 Ellen Friedman 4444 Be ready to adapt: Conditions will change
  • 45. © 2018 Ellen Friedman 4545 DataOps: Brings Flexibility & Focus Platform&network Operations Softwareengineering Architecture&planning Dataengineering Datascience Productmanagement DataOps Team B DataOps Team A Cross functional DataOps teams •  You don’t have to be a data scientist to contribute to machine learning •  Software engineer/ developer plays a role: but you need good data skills
  • 46. © 2018 Ellen Friedman 4646 DataOps Principles “DataOps teams seek to orchestrate data, tools, code and environments from beginning to end.” Thor Olavsrud interview with Ted Dunning & Ellen Friedman for CIO https://www.cio.com/article/3237694/analytics/what-is-dataops-data-operations-analytics.html by Ted Dunning 13 Sept 2017 in “InfoWorld” https://www.infoworld.com/article/3223688/machine- learning/machine-learning-skills-for-software-engineers.html
  • 47. © 2018 Ellen Friedman 4747 Advantages of a DataOps Approach •  Able to pivot & respond to real-world events as they happen •  Improved efficiency and better use of people’s time •  Faster time-to-value •  A good fit to working with a global data fabric
  • 48. © 2018 Ellen Friedman 4848 Machine Learning Logistics: Model Management in the Real World Free copy online courtesy of MapR: https://mapr.com/ebook/machine-learning- logistics/ By Ted Dunning & Ellen Friedman © September 2017
  • 49. © 2018 Ellen Friedman 4949 Streaming Architecture: New Designs Using Apache Kafka & MapR Streams Free copy online courtesy of MapR: http://bit.ly/mapr-streaming-architecture-book By Ted Dunning & Ellen Friedman © March 2017
  • 50. © 2018 Ellen Friedman 5050 Introduction to Apache Flink Free copy online courtesy of MapR: https://mapr.com/ebooks/intro-to-apache- flink/ By Ellen Friedman & Kostas Tzoumas © September 2016 Ellen Friedman & Kostas Tzoumas Introduction toApacheFlink Stream Processing for Real Time and Beyond New ebook by Ellen Friedman and Kostas Tzoumas In this book you’ll learn: · What Apache Flink can do · How it maintains consistency and provides flexibility · How people are using it, including in production · Best practices for streaming architectures Download your copy: mapr.com/flink-book
  • 51. © 2018 Ellen Friedman 5151 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  • 52. © 2018 Ellen Friedman 5252 Thank you !
  • 53. © 2018 Ellen Friedman 5353 Contact Information Ellen Friedman, PhD Principal Technologist, MapR Technologies Committer Apache Drill & Apache Mahout projects O’Reilly author Email efriedman@mapr.com ellenf@apache.org Twitter @Ellen_Friedman #bbuzz