SlideShare une entreprise Scribd logo
1  sur  24
Confidential - do not distribute
Hotels.com’s journey to becoming
anAlgorithmic Business
Matthew Fryer
VP,ChiefDataScienceOfficer
mfryer@hotels.com
Confidential - do not distribute
Part of Expedia, Inc. family
>385,000 properties
89 countries
39 languages
>30m Hotels.com Rewards Members
Home of Captain Obvious
Billions of Recommendations, based on real-time Data per day
Hotels.com
Confidential - do not distribute
Confidential - do not distribute
Confidential - do not distribute
5
Data Science Engineering Front End Development
Confidential - do not distribute
“Artificial Intelligence Will Be
Travel’s Next Big Thing”
Barry Diller
Chairman & Senior Executive,
Expedia, Inc.
3M’s are disruptive
technology
Mobile
Messaging / NLP
Machine Learning
Confidential - do not distribute
Confidential - do not distribute 8
Core Elements of our Data Science Cloud Platform
Databricks Unified Platform
Maestro – Our Internally Developed
Platform on AWS
(EMR, Spark, R-Studio, Intellij,SBT, Jupyter,
Zeppelin, Unit / QA, Metastore,Apache Airflow,
Keras, Tensorflow)
Proof of Concept on Google
Cloud, Beam, Spark &
Tensorflow
Confidential - do not distribute
Databricks Unified Platform
Chart is in 1 hour blocks, y axis = number of 32 core instances
9
• Key asset to the success of data science at Hotels.com
• Key in driving up data scientist productivity / efficiency / flexibility
• Helps make our data science lifecycle operate much easier and
faster driving speed to market
• Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting
cost effective spot instance on AWS.
Confidential - do not distribute
The hidden secret of data science and AI
Typically data scientists are investing large amounts of
time in feature / data engineering areas which are ripe for
a technology solution
10
Confidential - do not distribute 11
ALPs – Algorithm Lifecycle Pipeline Service
The end to end ML Platform
Confidential - do not distribute
Site Data
TrainingScoring & serving
Hotels.com
Training
Real-time
scoring / bandit
Ingestion
Cache
Service
Data pipelines
Data set generation,
feature extraction
Reporting
Train & deploy
model
Update feedback loop
with CTR, GP etc.
Clickstream
Experiment
Experiment
Store &
serve scores
Assign
variant
Calculate scores
Data
pipelines
Frameworks
& Platforms
Lifecycle
/ Deploy
Develop and
maintain ML/ AI
pipelines
Methods to
research & exploit
ML & AI
innovation
Implement ML / AI
in production
Data
capture
Accessible data
Confidential - do not distribute
Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour
Images are an important factor while choosing a hotel
13
0% 10% 20% 30% 40% 50% 60% 70% 80%
Loyalty Program
Reviews
Hotel Brand
Star Rating
Destination Info
Images
Hotel Info
Factors other than price/location
Very Imporant/Important Important Very Important
Confidential - do not distribute
Computer Vision problems we try to tackle
14
Near Duplicate Detection
Scene Classification Image Ranking
Confidential - do not distribute 15
Tagged as Bathroom
Confidential - do not distribute 16
GPU’s quickly became key, took a large effort to optimize using
Keras + Tensorflow (Inception v3 + ResNet)
493
67
20
7
4
1
10
100
1000
12-CPU 1-GPU 1-GPU +
limited cache
16-GPU +
limited cache
16-GPU + full
cache
Days CIFAR2
Expedia Small
15
2.5
0
5
10
15
20
16-GPU + full cache Optimized
Days
Confidential - do not distribute
Near Duplicate Detection: Real world examples
17
Non-Duplicates – probability 100%
Non-Duplicates – probability 95.91%
Duplicates – probability 97.98%
Duplicates – probability 98.43%
Confidential - do not distribute
ROOM/BATHROOM
Using the model: Real world examples
18
EXTERIOR/HOTEL INTERIOR/SEATING_LO
BBY
ROOM/LIVING_ROOM
ROOM/GUESTROOM
FACILITIES/DINING
INTERIOR/SEATING_LOBBY
FACILITIES/POOL
Confidential - do not distribute
Accuracy & Confusion Matrix
19
• After many manual / long
winded iterations and
regularization processes
tuning hyperparameters
• We achieved good
accuracy and low
confusion matrix
Confidential - do not distribute
Optimizing the photo order for improved customer
experiences
20
Original Model
Reference: Radisson Blu Edwardian Berkshire Hotel, London
Confidential - do not distribute
Finding the right hotel in our marketplace is core to
our customers needs.
Confidential - do not distribute
Kensington
Bloomsbury
Heathrow
Canary
Wharf
Paddington
Westminster
London City
Airport
Chelsea
Battersea
Wimbledon
Wembley
City of
London
As an example different user segments like to stay in
different locations
Confidential - do not distribute 23
Utility
Utility
Utility
just browsing! BOOK!Intent
(click)
Confidential - do not distribute
Thank you
mfryer@hotels.com
https://uk.linkedin.com/in/matthewfryer
@mattfryer

Contenu connexe

Plus de Spark Summit

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Spark Summit
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Spark Summit
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Spark Summit
 
Variant-Apache Spark for Bioinformatics with Piotr Szul
Variant-Apache Spark for Bioinformatics with Piotr SzulVariant-Apache Spark for Bioinformatics with Piotr Szul
Variant-Apache Spark for Bioinformatics with Piotr SzulSpark Summit
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit
 

Plus de Spark Summit (20)

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
 
Variant-Apache Spark for Bioinformatics with Piotr Szul
Variant-Apache Spark for Bioinformatics with Piotr SzulVariant-Apache Spark for Bioinformatics with Piotr Szul
Variant-Apache Spark for Bioinformatics with Piotr Szul
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 

Dernier

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Dernier (20)

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Hotels.com's Journey to Becoming an Algorithmic Business...Exponential Growth in Data Science Whilst Migrating to Apache Spark+Cloud All at the Same Time with Matt Fryer

  • 1. Confidential - do not distribute Hotels.com’s journey to becoming anAlgorithmic Business Matthew Fryer VP,ChiefDataScienceOfficer mfryer@hotels.com
  • 2. Confidential - do not distribute Part of Expedia, Inc. family >385,000 properties 89 countries 39 languages >30m Hotels.com Rewards Members Home of Captain Obvious Billions of Recommendations, based on real-time Data per day Hotels.com
  • 3. Confidential - do not distribute
  • 4. Confidential - do not distribute
  • 5. Confidential - do not distribute 5 Data Science Engineering Front End Development
  • 6. Confidential - do not distribute “Artificial Intelligence Will Be Travel’s Next Big Thing” Barry Diller Chairman & Senior Executive, Expedia, Inc. 3M’s are disruptive technology Mobile Messaging / NLP Machine Learning
  • 7. Confidential - do not distribute
  • 8. Confidential - do not distribute 8 Core Elements of our Data Science Cloud Platform Databricks Unified Platform Maestro – Our Internally Developed Platform on AWS (EMR, Spark, R-Studio, Intellij,SBT, Jupyter, Zeppelin, Unit / QA, Metastore,Apache Airflow, Keras, Tensorflow) Proof of Concept on Google Cloud, Beam, Spark & Tensorflow
  • 9. Confidential - do not distribute Databricks Unified Platform Chart is in 1 hour blocks, y axis = number of 32 core instances 9 • Key asset to the success of data science at Hotels.com • Key in driving up data scientist productivity / efficiency / flexibility • Helps make our data science lifecycle operate much easier and faster driving speed to market • Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting cost effective spot instance on AWS.
  • 10. Confidential - do not distribute The hidden secret of data science and AI Typically data scientists are investing large amounts of time in feature / data engineering areas which are ripe for a technology solution 10
  • 11. Confidential - do not distribute 11 ALPs – Algorithm Lifecycle Pipeline Service The end to end ML Platform
  • 12. Confidential - do not distribute Site Data TrainingScoring & serving Hotels.com Training Real-time scoring / bandit Ingestion Cache Service Data pipelines Data set generation, feature extraction Reporting Train & deploy model Update feedback loop with CTR, GP etc. Clickstream Experiment Experiment Store & serve scores Assign variant Calculate scores Data pipelines Frameworks & Platforms Lifecycle / Deploy Develop and maintain ML/ AI pipelines Methods to research & exploit ML & AI innovation Implement ML / AI in production Data capture Accessible data
  • 13. Confidential - do not distribute Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour Images are an important factor while choosing a hotel 13 0% 10% 20% 30% 40% 50% 60% 70% 80% Loyalty Program Reviews Hotel Brand Star Rating Destination Info Images Hotel Info Factors other than price/location Very Imporant/Important Important Very Important
  • 14. Confidential - do not distribute Computer Vision problems we try to tackle 14 Near Duplicate Detection Scene Classification Image Ranking
  • 15. Confidential - do not distribute 15 Tagged as Bathroom
  • 16. Confidential - do not distribute 16 GPU’s quickly became key, took a large effort to optimize using Keras + Tensorflow (Inception v3 + ResNet) 493 67 20 7 4 1 10 100 1000 12-CPU 1-GPU 1-GPU + limited cache 16-GPU + limited cache 16-GPU + full cache Days CIFAR2 Expedia Small 15 2.5 0 5 10 15 20 16-GPU + full cache Optimized Days
  • 17. Confidential - do not distribute Near Duplicate Detection: Real world examples 17 Non-Duplicates – probability 100% Non-Duplicates – probability 95.91% Duplicates – probability 97.98% Duplicates – probability 98.43%
  • 18. Confidential - do not distribute ROOM/BATHROOM Using the model: Real world examples 18 EXTERIOR/HOTEL INTERIOR/SEATING_LO BBY ROOM/LIVING_ROOM ROOM/GUESTROOM FACILITIES/DINING INTERIOR/SEATING_LOBBY FACILITIES/POOL
  • 19. Confidential - do not distribute Accuracy & Confusion Matrix 19 • After many manual / long winded iterations and regularization processes tuning hyperparameters • We achieved good accuracy and low confusion matrix
  • 20. Confidential - do not distribute Optimizing the photo order for improved customer experiences 20 Original Model Reference: Radisson Blu Edwardian Berkshire Hotel, London
  • 21. Confidential - do not distribute Finding the right hotel in our marketplace is core to our customers needs.
  • 22. Confidential - do not distribute Kensington Bloomsbury Heathrow Canary Wharf Paddington Westminster London City Airport Chelsea Battersea Wimbledon Wembley City of London As an example different user segments like to stay in different locations
  • 23. Confidential - do not distribute 23 Utility Utility Utility just browsing! BOOK!Intent (click)
  • 24. Confidential - do not distribute Thank you mfryer@hotels.com https://uk.linkedin.com/in/matthewfryer @mattfryer

Notes de l'éditeur

  1. Comments I checked the IR website for the latest data that we have made public (including Annual Report) Was planning to only briefly linger on this slide, will call out a few data points especially recommendation volume + loyalty member etc
  2. Comments (This slide has a build on it, you can see it by slideshow view) General thankyou to Spark Summit and Databricks for inviting me Share goal of presentation, eg highlight focus on transforming customer experiences with algorithms Hotels.com / our move to spark / cloud in the last year and share some of the interesting things we are doing Link to the slide : highlight it used to feel like there was data everywhere the size of the torch is growing every day
  3. Comments Create / Build of new Data Science Function, Move to Public Cloud (mainly AWS + some Azure / GCP) from On Prem, Move to Spark from SAS / Core Hadoop in all in the last year As per the title , comment we are entering a golden age of data science where we can now use data to find patterns, build algo to help customer experiences, Imagine the world when we are enter adulthood aka maturity Given the potential I think we are all toddlers with so much more to learn and figure out Better to be fast first (example of testing and freedom to innovate) and ideally often being correct is a bonus!
  4. Comments It has taken complete teamwork from across the business to deliver success and well aligned pipelines i) Built in creating a data science function in the last 2 years, it is team effort and data science / algorithms sit on the back on the workhorse of engineers ii) This allows algorithms to make choices and understand patterns to optimize for customer experiences rather than limited optimization. iii) Part of the secret has been matching data scientists with dedicated data, network, devops and software engineers on the platform iv) Create a community (big group hug) to share approach and work together for success Overall we have >20 amazing data scientist / >15 dedicated data science engineers + growing fast + 100’s of analysts and engineers
  5. Comments machine learning and artificial intelligence will combine to manage companies’ big data troves and there will be layers of innovation “tacked onto distribution systems.” Key has been support from the very top of the company Call out support from the very top has been vital to move forward at pace with wider organization alignment Dara’s comment from last earning call of the 3 M’s and organic Intelligence I think AI is a good deal down the road. I think right now, we are more dependent on OI, organic intelligence, here; of folks here at the company. I think as far as disruptive technology, I do like to talk about the 3 Ms, and it's not disruptive. It's just happening. One is mobile for us. And right now with most brands, over 1/3 of our transactions are mobile. Over half of our traffic is mobile. And the cool thing about mobile is it's always on and it gives you location context. The second M for us that's emerging especially in the APAC markets, are messaging. And what messaging does for us is it allows two-way communication at any time, but it also combines identity with that communication. And once you have identity, you can start communicating with someone on a one-to-one basis. Most of our systems right now are built to serve the average. This is a consumer where you come to Expedia, most of our systems are built to serve the average consumer. Now more and more we can optimize to the specific customer, and you combine that with a third M, which is machine learning, it is only possible to optimize to the individual based on very significant amounts of data, very significant amounts of interaction so that you can start treating every single customer in a different way. You can go back to the olden days when your travel agent knew exactly what you wanted. This is going to be disruptive, but it's going to be a slow disruption as we learn more
  6. Comments i)Hotels.com / Expedia in the first travel revolution, empowered consumers globally with the transparency of price, variety, choice and content ii)Machine Learning / Algorithms are creating the next travel revolution transforming consumer experiences and effectively powering the turnaround of the travel agent iii) Future of having the conversation with the travel agent with a modern twist. to having a messaging (20 years we re-invented travel and democratised travel and information (green screen around), now we can with data science and spark power personalised experiences and give customers access to the best experience of both. iv) Machine Learning is now at the strategic core to the growth and future of Hotels.com
  7. Databricks Optimised for Data Science Easy to use UI (Notebook) Advanced Job Scheduling Spot pool capability Great for algorithm development & feature engineering (aka ETL) Awesome support from Spark Engineers Maestro – AWS Platform Integrated platform for large model development and deployment Advanced cluster support including Maven / Artifactory Maestro Framework (Internal extension to Spark ML) Individual environment per Data Scientist Fast to R&D / Fully ephemeral Google Launched PoC on Google Cloud Evaluate Google approach to AI / Machine Learning including Tensorflow GPU NLP / Vision API’s ML Engine Datalab notebooks Apache Bean / Dataflow It is the code-base responsible for building Machine Learning models for HCOM. It is developed in-house using Scala 2.11 & Spark 2.0 (migrating to 2.1) It is a ML framework which: Standardizes and speeds up the way we build models. Provides all the necessary tools for training, testing & validating models Google PoC
  8. Facilitated use of ‘extreme elasticity’ incl spot instances Saving on cost whilst using huge compute power Speed to market dramatically increases Spot instance costs ~10-20% of On Demand cost Databricks is making things easy and doing image classification across the Expedia portfolio   Highlight value prop that you get w Databricks over open source Spark    Works out of the box. Elasticity, ease of use, notebooks, etc. 
  9. Across millions of hotel and user submitted images Critical use case on mobile
  10. Comment Highlight images are not algo optimized historically + Now have 100 of thousands of User Photos to categorise and sort Built in Spark and Tensorflow, Convolution Neural Net Approach with some surprisingly good accuracy, would recommend everyone to try there hand at deep learning.
  11. Target: Detect near-duplicate images on the PDP. Dataset: A synthetic dataset produced by applying transformations on hotel photos. Size ~ 6 million images. Network: A custom Siamese network on top of the Scene Detection classifier. Results: 99.97% accuracy on the synthetic dataset. Validated on real world images. Important to use your own data, we obtained 82% from off the shelf deep learning API’s (such as Google Vision API etc.)
  12. Linking in customer feedback loops with the neural nets to begin optimizing the most relevant sort of images for different customers.
  13. Comments Personalisation especially MicroSegmentation is crucial, taking max signals, spark has enabled us to cope with the scale of data Popularity balanced with Diversity / Quality and Niche Customer Needs All in the context of linking searches of users doing 4-9 different searches.
  14. Size increase of 20x the data, covers attribution of all customer clicks (typically 4-9 searches per user) 10x the data columns Facilitates personalization / microsegments