SlideShare une entreprise Scribd logo
1  sur  38
1StoryStream.ai
From POC to Production in
Minimal Time –
Avoiding Pain in ML Projects
Dr Janet Bastiman
@yssybyl
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
poc-ml/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
2StoryStream.ai
Project timings
Dr Janet Bastiman @yssybyl
3StoryStream.ai
The world’s leading automotive content platform
StoryStream is a dedicated automotive content platform, trusted by some of the
world’s leading car brands. Specifically created to help automotive brands
provide a more relevant, engaging customer experience, fuelled with authentic
content and designed for efficiently scaling content operations across global
teams.
● Grow customer engagement and conversions by up to 25%
● Reduce content creation and management costs by up to 60%
● Provide a more authentic customer experience
● Understand your customer in a deeper way
About StoryStream
The Core StoryStream Benefits
4StoryStream.ai
5StoryStream.ai
Dr Janet Bastiman @yssybyl
6StoryStream.ai
“[Client] needs this to go live at the end of
the month, I promised them we could
deliver...”
Every salesperson ever
Dr Janet Bastiman @yssybyl
7StoryStream.ai
Project timings
Dr Janet Bastiman @yssybyl
● 35 models = 1050 days (one person linear)
● ~ 5 years for one person working Mon-Fri - who is allowed
holidays :)
● 250 days with parallelisation of tasks and data upfront
● 150 days on worksheet, balanced by an increase in ongoing
license
8StoryStream.ai
Can you guess what happened next?
Dr Janet Bastiman @yssybyl
9StoryStream.ai
What would it take to get it done in that time?
Dr Janet Bastiman @yssybyl
The Core (2003)
Paramount Pictures
10StoryStream.ai
“They don’t have any data to give us”
Dr Janet Bastiman @yssybyl
11StoryStream.ai
If you are dealing with any critical
inferencing do not take shortcuts, do it
properly and do it rigorously and stand up
to the company and say no - make sure
it’s clear that the timelines will be longer
to get it right.
Dr Janet Bastiman @yssybyl
12StoryStream.ai
Without Data ML is just a Random Result
Dr Janet Bastiman @yssybyl
● Legal public sources
● https://github.com/awesomedata/awesome-public-datasets
● https://www.kaggle.com/datasets
● Take your own pictures/videos
● access/permission?
● Slow and inconsistent
● Scrape the client site with permission
13StoryStream.ai
How much data?
Dr Janet Bastiman @yssybyl
• Vision: 1000 images per output class but depends on
complexity of the problem
• Time series: at least double the time period over which you
are predicting, but be cautious of data becoming irrelevant
• Text: very variable depending on the problem
• This also changes if you already have pre-trained networks
that you’re updating
14StoryStream.ai
What do you do with the Data?
Dr Janet Bastiman @yssybyl
● Selection bias
● Random Sampling
● Over coverage
● Undercoverage
● Measurement (Response) error
● Processing errors
● Participation bias
15StoryStream.ai
What do you do with the Data?
Dr Janet Bastiman @yssybyl
Photos
Scrape
S3 bucket ● Unique filename
● source
● Set uuid (if multiple images of
same car)
● Date taken
● S3 bucket per vehicle variant
16StoryStream.ai
What do you do with the Data?
Dr Janet Bastiman @yssybyl
Photos
Scrape
Car
Detector
S3
Bucket
Manual
verification
● Extra field for label
● S3 bucket name became
mostly irrelevant
17StoryStream.ai
Crowdsource labelling
Dr Janet Bastiman @yssybyl
https://xkcd.com/1897/
19StoryStream.ai
Data Pipeline
Dr Janet Bastiman @yssybyl
Data In
Object
detector
Images
saved
Auxiliary
info saved
Temp public
access
Extract for
Turk
Import of
results
Dashboard
Expert
clean
Data
Ready
21StoryStream.ai
Transfer Learning
Dr Janet Bastiman @yssybyl
● Use transfer learning - fix most of the weights of
a good network and adapt the last few layers
● Fast and easy retraining and works with smaller
data sets in a variety of fields
● (image) https://arxiv.org/abs/1903.02196
● (series) https://arxiv.org/abs/1907.01332
● (audio) https://arxiv.org/abs/1909.07526
Deep Learning for Vision Systems, Mohamed Elgendy
22StoryStream.ai
Unbalanced Data
Dr Janet Bastiman @yssybyl
23StoryStream.ai https://www.designhacks.co/products/cognitive-bias-codex-poster
25StoryStream.ai
Stand on the shoulders
of giants…
Dr Janet Bastiman @yssybyl
● For some problems CNNs are robust to
noisy labels and up to 20 time noise to
real labels can still give business level
accuracy
https://arxiv.org/pdf/1705.10694.pdf
● Find the right architecture
http://www.asimovinstitute.org/neural-network-zoo/
26StoryStream.ai
Go old school
Dr Janet Bastiman @yssybyl
Reduce the dimensionality of the problem and use Bayesian approach, KNN or SVM
https://xkcd.com/2059/
27StoryStream.ai
Choose wisely
Dr Janet Bastiman @yssybyl
28StoryStream.ai
Simplify the problem
Dr Janet Bastiman @yssybyl
Removal of camera artefacts in eye images to
make detection easier - Jeffrey De Fauw
http://blog.kaggle.com/2015/08/10/detecting-diabetic-
retinopathy-in-eye-images/
Image Image
Specific
Vehicle
Specific
Vehicle
Car?
Make?
Removal of Doppler effect on moving source using
fractional octave band shifting, F Mobley
https://asa.scitation.org/doi/pdf/10.1121/2.0000578?class=pdf
Δ𝑛=−r[𝑙𝑜𝑔2(1−𝑀cos𝜃sin𝜑)]
29StoryStream.ai
Get every last drop from what you have
Dr Janet Bastiman @yssybyl
Statistical anatomical modelling for efficient and
personalised spine biomechanical models - I Castro
Mateos PhD thesis
Have a toolkit of augmentation
approaches but choose what’s relevant to
your needs...
30StoryStream.ai
Augmentation - detail
Dr Janet Bastiman @yssybyl
● Flip L/R U/D
● Rotations
● Reduce or enlarge bounding box coordinates by N%
● Add occlusions
https://www.umbc.edu/rssipl/people/aplaza/Papers/Journals/2019
.GRSL.Occlusion.pdf
● Change hue saturation and value of colours in the image
https://arxiv.org/pdf/1902.06543.pdf
● Copypairing - https://arxiv.org/abs/1909.00390#
34StoryStream.ai
Infrastructure
Dr Janet Bastiman @yssybyl
Data In Data Store
Taxonomy
Classifier
Definition
Test Set
DockerHub
Setup
Codeship
Project
GitHub
Setup
Notification
Slack
Email
Template
AWS
Image
Scripts
Dashboard
35StoryStream.ai
Cloud Formation
Dr Janet Bastiman @yssybyl
36StoryStream.ai
Automation
Dr Janet Bastiman @yssybyl
Delete
local data
Build
container
Get model
and key
Run test
harness
Validate
container
Run
container
Report
results
DashboardCommit
Build new
Container
37StoryStream.ai
Stack Automation
Dr Janet Bastiman @yssybyl
Add new
container
Start stack
Run stack
test harness
Better?
Compare
results
Create docs
YesUpdate CFLive
No
Human
investigation
38StoryStream.ai
Automatic Documentation
Dr Janet Bastiman @yssybyl
LaTeX
templates
Pweave
.tex files
and images
Save with
model files
Convert to
PDF
Run LaTeX
If live, save
in live docs
Email to
team
40StoryStream.ai
Did we make it?
Dr Janet Bastiman @yssybyl
● Some really difficult images
● Only expected images were
given
● Where it was wrong it was
(mostly) sensibly wrong
● Client happy
● Cool automated system
41StoryStream.ai
The Playbook
Dr Janet Bastiman @yssybyl
ai-playbook.com
42StoryStream.ai
Dr Janet Bastiman @yssybyl
Thank You
https://xkcd.com/2191/
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
poc-ml/

Contenu connexe

Plus de C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Plus de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Dernier

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Dernier (20)

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

From POC to Production in Minimal Time - Avoiding Pain in ML Projects

  • 1. 1StoryStream.ai From POC to Production in Minimal Time – Avoiding Pain in ML Projects Dr Janet Bastiman @yssybyl
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ poc-ml/
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 5. 3StoryStream.ai The world’s leading automotive content platform StoryStream is a dedicated automotive content platform, trusted by some of the world’s leading car brands. Specifically created to help automotive brands provide a more relevant, engaging customer experience, fuelled with authentic content and designed for efficiently scaling content operations across global teams. ● Grow customer engagement and conversions by up to 25% ● Reduce content creation and management costs by up to 60% ● Provide a more authentic customer experience ● Understand your customer in a deeper way About StoryStream The Core StoryStream Benefits
  • 8. 6StoryStream.ai “[Client] needs this to go live at the end of the month, I promised them we could deliver...” Every salesperson ever Dr Janet Bastiman @yssybyl
  • 9. 7StoryStream.ai Project timings Dr Janet Bastiman @yssybyl ● 35 models = 1050 days (one person linear) ● ~ 5 years for one person working Mon-Fri - who is allowed holidays :) ● 250 days with parallelisation of tasks and data upfront ● 150 days on worksheet, balanced by an increase in ongoing license
  • 10. 8StoryStream.ai Can you guess what happened next? Dr Janet Bastiman @yssybyl
  • 11. 9StoryStream.ai What would it take to get it done in that time? Dr Janet Bastiman @yssybyl The Core (2003) Paramount Pictures
  • 12. 10StoryStream.ai “They don’t have any data to give us” Dr Janet Bastiman @yssybyl
  • 13. 11StoryStream.ai If you are dealing with any critical inferencing do not take shortcuts, do it properly and do it rigorously and stand up to the company and say no - make sure it’s clear that the timelines will be longer to get it right. Dr Janet Bastiman @yssybyl
  • 14. 12StoryStream.ai Without Data ML is just a Random Result Dr Janet Bastiman @yssybyl ● Legal public sources ● https://github.com/awesomedata/awesome-public-datasets ● https://www.kaggle.com/datasets ● Take your own pictures/videos ● access/permission? ● Slow and inconsistent ● Scrape the client site with permission
  • 15. 13StoryStream.ai How much data? Dr Janet Bastiman @yssybyl • Vision: 1000 images per output class but depends on complexity of the problem • Time series: at least double the time period over which you are predicting, but be cautious of data becoming irrelevant • Text: very variable depending on the problem • This also changes if you already have pre-trained networks that you’re updating
  • 16. 14StoryStream.ai What do you do with the Data? Dr Janet Bastiman @yssybyl ● Selection bias ● Random Sampling ● Over coverage ● Undercoverage ● Measurement (Response) error ● Processing errors ● Participation bias
  • 17. 15StoryStream.ai What do you do with the Data? Dr Janet Bastiman @yssybyl Photos Scrape S3 bucket ● Unique filename ● source ● Set uuid (if multiple images of same car) ● Date taken ● S3 bucket per vehicle variant
  • 18. 16StoryStream.ai What do you do with the Data? Dr Janet Bastiman @yssybyl Photos Scrape Car Detector S3 Bucket Manual verification ● Extra field for label ● S3 bucket name became mostly irrelevant
  • 19. 17StoryStream.ai Crowdsource labelling Dr Janet Bastiman @yssybyl https://xkcd.com/1897/
  • 20. 19StoryStream.ai Data Pipeline Dr Janet Bastiman @yssybyl Data In Object detector Images saved Auxiliary info saved Temp public access Extract for Turk Import of results Dashboard Expert clean Data Ready
  • 21. 21StoryStream.ai Transfer Learning Dr Janet Bastiman @yssybyl ● Use transfer learning - fix most of the weights of a good network and adapt the last few layers ● Fast and easy retraining and works with smaller data sets in a variety of fields ● (image) https://arxiv.org/abs/1903.02196 ● (series) https://arxiv.org/abs/1907.01332 ● (audio) https://arxiv.org/abs/1909.07526 Deep Learning for Vision Systems, Mohamed Elgendy
  • 24. 25StoryStream.ai Stand on the shoulders of giants… Dr Janet Bastiman @yssybyl ● For some problems CNNs are robust to noisy labels and up to 20 time noise to real labels can still give business level accuracy https://arxiv.org/pdf/1705.10694.pdf ● Find the right architecture http://www.asimovinstitute.org/neural-network-zoo/
  • 25. 26StoryStream.ai Go old school Dr Janet Bastiman @yssybyl Reduce the dimensionality of the problem and use Bayesian approach, KNN or SVM https://xkcd.com/2059/
  • 27. 28StoryStream.ai Simplify the problem Dr Janet Bastiman @yssybyl Removal of camera artefacts in eye images to make detection easier - Jeffrey De Fauw http://blog.kaggle.com/2015/08/10/detecting-diabetic- retinopathy-in-eye-images/ Image Image Specific Vehicle Specific Vehicle Car? Make? Removal of Doppler effect on moving source using fractional octave band shifting, F Mobley https://asa.scitation.org/doi/pdf/10.1121/2.0000578?class=pdf Δ𝑛=−r[𝑙𝑜𝑔2(1−𝑀cos𝜃sin𝜑)]
  • 28. 29StoryStream.ai Get every last drop from what you have Dr Janet Bastiman @yssybyl Statistical anatomical modelling for efficient and personalised spine biomechanical models - I Castro Mateos PhD thesis Have a toolkit of augmentation approaches but choose what’s relevant to your needs...
  • 29. 30StoryStream.ai Augmentation - detail Dr Janet Bastiman @yssybyl ● Flip L/R U/D ● Rotations ● Reduce or enlarge bounding box coordinates by N% ● Add occlusions https://www.umbc.edu/rssipl/people/aplaza/Papers/Journals/2019 .GRSL.Occlusion.pdf ● Change hue saturation and value of colours in the image https://arxiv.org/pdf/1902.06543.pdf ● Copypairing - https://arxiv.org/abs/1909.00390#
  • 30. 34StoryStream.ai Infrastructure Dr Janet Bastiman @yssybyl Data In Data Store Taxonomy Classifier Definition Test Set DockerHub Setup Codeship Project GitHub Setup Notification Slack Email Template AWS Image Scripts Dashboard
  • 32. 36StoryStream.ai Automation Dr Janet Bastiman @yssybyl Delete local data Build container Get model and key Run test harness Validate container Run container Report results DashboardCommit Build new Container
  • 33. 37StoryStream.ai Stack Automation Dr Janet Bastiman @yssybyl Add new container Start stack Run stack test harness Better? Compare results Create docs YesUpdate CFLive No Human investigation
  • 34. 38StoryStream.ai Automatic Documentation Dr Janet Bastiman @yssybyl LaTeX templates Pweave .tex files and images Save with model files Convert to PDF Run LaTeX If live, save in live docs Email to team
  • 35. 40StoryStream.ai Did we make it? Dr Janet Bastiman @yssybyl ● Some really difficult images ● Only expected images were given ● Where it was wrong it was (mostly) sensibly wrong ● Client happy ● Cool automated system
  • 36. 41StoryStream.ai The Playbook Dr Janet Bastiman @yssybyl ai-playbook.com
  • 37. 42StoryStream.ai Dr Janet Bastiman @yssybyl Thank You https://xkcd.com/2191/
  • 38. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ poc-ml/