SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
ML and Data Science
at Uber
Sudhir Tonse, Engineering Lead, Uber
FEB 18,
2017
GITPro 2017
Where do we want to go today?
Agenda
Introduction Problem Space Tools of the Trade
Challenges likely unique to
Uber .. interesting
opportunities
Challenges &
Opportunities
Who am I and what are we
talking about today?
Why does Uber need ML
and what are some of
the problems we tackle?
What does Uber’s tech
stack look like?
Agenda
Hop on the Uber ML Ride … destination please?
Uber, this talk and me the speaker
Introduction
•Engineering Leader @ Uber
•Marketplace Data
•Realtime Data Processing
•Analytics
•Forecasting
• Previous -> MicroServices/Cloud Platform at
Netflix
•Twitter @stonse
5
Who am I?
Driver Partner Riders Merchants
Uber’s logistic platform
Marketplace
Our partner in the ride
sharing business
Folks like you and me who
request a ride on any of
Uber’s transportation
products. e.g. UberX,
uberPool
Restaurants or shops that
have signed on to the
Uber platform.
Introduction
Uber
“Transportation as reliable as
running water, everywhere, for
everyone”
Uber
Mission
• Mapping (Routes, ETAs, …)
• Fraud and Security
• uberEATS Recommendations
• Marketplace Optimizations
• Forecasting
• Driver Positioning
• Health, Trends, Issues, ...
• And more …
ML Problems
Why do we need Machine Learning?
ETA, Route Optimization,
Pickup Points, Pool rider
matches
Marketplace
Build the platform, products, and algorithms
responsible for the real time execution and online
optimization of Uber's marketplace.
We are building the brain of Uber, solving NP-hard
algorithms and economic optimization problems at
scale.
Uber | Marketplace
Mission
Request Event
Driver Accept
Event
Trip Started
Event
more events
…
Overall Flow
Ma
t
c
h
Se
r
v
i
ces
Trip States
Sub-title
Scale
~400 Cities
Many Billion Events per Day
Scale
Geo
Space
Vehicle
Types
Time
• Indexing, Lookup, Rendering
• Symmetric Neighbors
• Convex & Compact Regions
• Equal Areas
• Equal Shape
Space -> Hexagons
Granular Data
Multi-resolution Realtime Forecasting, Airport ETR
ML Examples
Real-time spatiotemporal
forecasting at a variable
resolution of time and space
Example 1
Rider Demand Forecasting
Predict #of Riders per hexagon for various time horizons
Spatial granularity & Multiresolution Forecasting
The more you aggregate
or zoom out, trends
emerge
Sparsity at hexagon level:
many hexagons have little
signal
1. Forecast at the hex-cluster level
2. Using past activity for a similar time
window, apportion out total activity from
the hex-cluster to its component hexagons
Multiresolution Forecasting
Forecasting at different spatial granularity
Airport ETR
ML Example No 2.
Airport Taxi Line Uber Airport Lot
Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3)
Airport Demand (ETR)
Mean Delay
~30 minutes
Half Life
~ 1.0 minute
“ETR too
much. I bail out
..”
Solution: Time Meter Banner
“Only about 20
minutes. I would
wait!”
20 minutes wait to get a
$40 trip, oh yeah!
Data Science Flow
A Typical Data Scientist Workflow
Analyze/Prepare Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Data Preparation
A Typical Data Scientist Workflow
Analyze/Prepare
Data exploration,
cleansing,
transformations etc.
Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Data Processing
Data Science Flow
A Typical Data Scientist Workflow
Feature Selection
Model Fitting
Evaluation
StorageEvaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Data Scientists (Analytics)
Data Science Flow
A Typical Data Scientist Workflow
Analyze/Prepare Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
Overview
Streamline the forecasting process
from conception to production
• Streams w/ flexible
geo-temporal resolution
• Valuable external data feeds
• Modular, reusable
components at each stage
• Same code for offline
model fitting and
production to enable fast
model iteration
Operators & Computation DAGs
Feature Generation
Online ModelsOffline Model Fitting
Predictions, Metrics & Visualizations
External DataStreams
Airport feed
Weather feed
Concerts feed
Realtime Models
- Something happened at a time and a
place. Now we will
Evaluate the DAG
- DAG evaluated for a single instant in time
real-time spatiotemporal forecasting at a variable resolution of time and space
Under the hood ..
Tools & Framework
• Curated set of algorithms
• Model Versioning
• Model Performance & Visualizations
• Automated Deployment Workflow
• …
Machine Learning as a Service
ML workflow at Uber
Open Source Technologies
Sub-title
Samza
Micro Batch based processing
Good integration with HDFS & S3
Exactly once semantics
Spark Streaming
Well integrated with Kafka
Built in State Management
Built in Checkpointing
Distributed Indexes & Queries
Versatile aggregations
Jupyter/IPython
Great community support
Data Scientists familiar with Python
..
Challenges & Opportunities
• What’s the best model for integrating vast amounts of disparate kinds
of information over space and time?
• What’s the best way of building spatiotemporal models in a fashion
that is effective, elegant, and debuggable?
• About a 100 or so more … :-)
ML Problems
Challenges
Links
Thank you!
• Realtime Streaming at Uber
https://www.infoq.com/presentations/real-tim
e-streaming-uber
• Spark at Uber
(http://www.slideshare.net/databricks/spark-
meetup-at-uber)
• Career at Uber
(https://www.uber.com/careers/)
•https://join.uber.com/marketplace
Happy to discuss design/architecture
Q & A
No product/business questions please :-)
@stonse
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be
reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any
information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the
use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise
exempt from disclosure under applicable law. All recipients of this document are notified that the information contained
herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any
way disclose this document or any of the enclosed information to any person other than employees of addressee to the
extent necessary for consultations with authorized personnel of Uber.
Sudhir Tonse
@stonse
Thank you

Contenu connexe

Tendances

QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberDanny Yuan
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Spark Summit
 
Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...DataWorks Summit/Hadoop Summit
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...Amazon Web Services
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from CrayBill Liu
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...confluent
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Databricks
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Spark Summit
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Spark Summit
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from GoogleBill Liu
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_awsHirokuni Uchida
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWSPaolo latella
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly
 

Tendances (20)

QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uber
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
 
Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from Google
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 

En vedette

Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Uber Analytics Test
Uber Analytics TestUber Analytics Test
Uber Analytics TestCoursetake
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventPros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventSudhir Tonse
 
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4UUBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4Uinfolearn - TEST4U
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleXavier Amatriain
 
Uber Interview Questions and Process: How to Pass Easily
Uber Interview Questions and Process: How to Pass EasilyUber Interview Questions and Process: How to Pass Easily
Uber Interview Questions and Process: How to Pass EasilyInterview Steps
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleSudhir Tonse
 
Using NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data CreativelyUsing NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data CreativelyGareth Hughes
 
On Analyzing and Specifying Concerns for Data as a Service
On Analyzing and Specifying Concerns for Data as a ServiceOn Analyzing and Specifying Concerns for Data as a Service
On Analyzing and Specifying Concerns for Data as a ServiceHong-Linh Truong
 
Data-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reportingData-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reportingAnalyticsWeek
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Kurt Brown
 
Data Driven Growth (Montreal 2015)
Data Driven Growth (Montreal 2015)Data Driven Growth (Montreal 2015)
Data Driven Growth (Montreal 2015)jwegan
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Issac Buenrostro
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Sudhir Tonse
 

En vedette (20)

Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Uber Analytics Test
Uber Analytics TestUber Analytics Test
Uber Analytics Test
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventPros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
 
Uber's Business Model
Uber's Business ModelUber's Business Model
Uber's Business Model
 
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4UUBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
UBER Analytics Preparation Course v.3.1 & 6.16: Services & Vocabulary - TEST4U
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Uber Interview Questions and Process: How to Pass Easily
Uber Interview Questions and Process: How to Pass EasilyUber Interview Questions and Process: How to Pass Easily
Uber Interview Questions and Process: How to Pass Easily
 
Culture
CultureCulture
Culture
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
UBER Strategy
UBER StrategyUBER Strategy
UBER Strategy
 
Product management
Product managementProduct management
Product management
 
Using NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data CreativelyUsing NCC Group Web Performance Data Creatively
Using NCC Group Web Performance Data Creatively
 
Pro_Tools_Tier_2
Pro_Tools_Tier_2Pro_Tools_Tier_2
Pro_Tools_Tier_2
 
On Analyzing and Specifying Concerns for Data as a Service
On Analyzing and Specifying Concerns for Data as a ServiceOn Analyzing and Specifying Concerns for Data as a Service
On Analyzing and Specifying Concerns for Data as a Service
 
Data-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reportingData-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reporting
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
 
Data Driven Growth (Montreal 2015)
Data Driven Growth (Montreal 2015)Data Driven Growth (Montreal 2015)
Data Driven Growth (Montreal 2015)
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
 

Similaire à ML and Data Science at Uber - GITPro talk 2017

Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_futureNisha Talagala
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital BusinessSrinath Perera
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Chun-Yu Tseng
 
Architectural Considerations for Startups
Architectural Considerations for StartupsArchitectural Considerations for Startups
Architectural Considerations for StartupsNiall Roche
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019webwinkelvakdag
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionBATbern
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Zhenxiao Luo
 
Open ETL for Real-Time Decision Making with Shuai Yuan
Open ETL for Real-Time Decision Making with Shuai YuanOpen ETL for Real-Time Decision Making with Shuai Yuan
Open ETL for Real-Time Decision Making with Shuai YuanDatabricks
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Amazon Web Services
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus
 

Similaire à ML and Data Science at Uber - GITPro talk 2017 (20)

Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
From Python to Java
From Python to JavaFrom Python to Java
From Python to Java
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Machine learning
Machine learningMachine learning
Machine learning
 
Architectural Considerations for Startups
Architectural Considerations for StartupsArchitectural Considerations for Startups
Architectural Considerations for Startups
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Open ETL for Real-Time Decision Making with Shuai Yuan
Open ETL for Real-Time Decision Making with Shuai YuanOpen ETL for Real-Time Decision Making with Shuai Yuan
Open ETL for Real-Time Decision Making with Shuai Yuan
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 

Dernier

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 

Dernier (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 

ML and Data Science at Uber - GITPro talk 2017

  • 1. ML and Data Science at Uber Sudhir Tonse, Engineering Lead, Uber FEB 18, 2017 GITPro 2017
  • 2. Where do we want to go today? Agenda
  • 3. Introduction Problem Space Tools of the Trade Challenges likely unique to Uber .. interesting opportunities Challenges & Opportunities Who am I and what are we talking about today? Why does Uber need ML and what are some of the problems we tackle? What does Uber’s tech stack look like? Agenda Hop on the Uber ML Ride … destination please?
  • 4. Uber, this talk and me the speaker Introduction
  • 5. •Engineering Leader @ Uber •Marketplace Data •Realtime Data Processing •Analytics •Forecasting • Previous -> MicroServices/Cloud Platform at Netflix •Twitter @stonse 5 Who am I?
  • 6. Driver Partner Riders Merchants Uber’s logistic platform Marketplace Our partner in the ride sharing business Folks like you and me who request a ride on any of Uber’s transportation products. e.g. UberX, uberPool Restaurants or shops that have signed on to the Uber platform. Introduction Uber
  • 7. “Transportation as reliable as running water, everywhere, for everyone” Uber Mission
  • 8. • Mapping (Routes, ETAs, …) • Fraud and Security • uberEATS Recommendations • Marketplace Optimizations • Forecasting • Driver Positioning • Health, Trends, Issues, ... • And more … ML Problems Why do we need Machine Learning? ETA, Route Optimization, Pickup Points, Pool rider matches
  • 9. Marketplace Build the platform, products, and algorithms responsible for the real time execution and online optimization of Uber's marketplace. We are building the brain of Uber, solving NP-hard algorithms and economic optimization problems at scale. Uber | Marketplace Mission
  • 10. Request Event Driver Accept Event Trip Started Event more events … Overall Flow Ma t c h Se r v i ces
  • 14. • Indexing, Lookup, Rendering • Symmetric Neighbors • Convex & Compact Regions • Equal Areas • Equal Shape Space -> Hexagons
  • 16. Multi-resolution Realtime Forecasting, Airport ETR ML Examples
  • 17. Real-time spatiotemporal forecasting at a variable resolution of time and space Example 1
  • 18. Rider Demand Forecasting Predict #of Riders per hexagon for various time horizons
  • 19. Spatial granularity & Multiresolution Forecasting The more you aggregate or zoom out, trends emerge Sparsity at hexagon level: many hexagons have little signal
  • 20. 1. Forecast at the hex-cluster level 2. Using past activity for a similar time window, apportion out total activity from the hex-cluster to its component hexagons Multiresolution Forecasting Forecasting at different spatial granularity
  • 21. Airport ETR ML Example No 2. Airport Taxi Line Uber Airport Lot
  • 22. Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3) Airport Demand (ETR) Mean Delay ~30 minutes Half Life ~ 1.0 minute
  • 23. “ETR too much. I bail out ..” Solution: Time Meter Banner “Only about 20 minutes. I would wait!” 20 minutes wait to get a $40 trip, oh yeah!
  • 24. Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 25. Data Preparation A Typical Data Scientist Workflow Analyze/Prepare Data exploration, cleansing, transformations etc. Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 27. Data Science Flow A Typical Data Scientist Workflow Feature Selection Model Fitting Evaluation StorageEvaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 29. Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 30. Overview Streamline the forecasting process from conception to production • Streams w/ flexible geo-temporal resolution • Valuable external data feeds • Modular, reusable components at each stage • Same code for offline model fitting and production to enable fast model iteration Operators & Computation DAGs Feature Generation Online ModelsOffline Model Fitting Predictions, Metrics & Visualizations External DataStreams Airport feed Weather feed Concerts feed
  • 31. Realtime Models - Something happened at a time and a place. Now we will Evaluate the DAG - DAG evaluated for a single instant in time real-time spatiotemporal forecasting at a variable resolution of time and space
  • 32. Under the hood .. Tools & Framework
  • 33. • Curated set of algorithms • Model Versioning • Model Performance & Visualizations • Automated Deployment Workflow • … Machine Learning as a Service ML workflow at Uber
  • 34. Open Source Technologies Sub-title Samza Micro Batch based processing Good integration with HDFS & S3 Exactly once semantics Spark Streaming Well integrated with Kafka Built in State Management Built in Checkpointing Distributed Indexes & Queries Versatile aggregations Jupyter/IPython Great community support Data Scientists familiar with Python
  • 36. • What’s the best model for integrating vast amounts of disparate kinds of information over space and time? • What’s the best way of building spatiotemporal models in a fashion that is effective, elegant, and debuggable? • About a 100 or so more … :-) ML Problems Challenges
  • 37. Links Thank you! • Realtime Streaming at Uber https://www.infoq.com/presentations/real-tim e-streaming-uber • Spark at Uber (http://www.slideshare.net/databricks/spark- meetup-at-uber) • Career at Uber (https://www.uber.com/careers/) •https://join.uber.com/marketplace
  • 38. Happy to discuss design/architecture Q & A No product/business questions please :-) @stonse
  • 39. Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. Sudhir Tonse @stonse Thank you