SlideShare a Scribd company logo
1 of 68
Improving the power of
a picture via A/B testing
Gopal Krishnan Director of Engineering
Dale Elliott Senior Software Engineer
Kenny Xie Senior Data Scientist
TV is a lean back experience
90 seconds
Pop Quiz
A round plane figure
whose boundary (the
circumference) consists
of points equidistant from
a fixed point (the center).
A round plane figure
whose boundary (the
circumference) consists
of points equidistant from
a fixed point (the center).
Can we do better?
Sensitivity test
The Short Game
Single title A/B test result
14% better 6% better
Testable Hypothesis
Displaying better artwork will
result in greater engagement and
retention by helping members
discover stories they will enjoy
even faster.
Data Driven
Netflix API service
Beacon (telemetry
collection service)
Hive (computes artwork
performance metrics for
every title/country/locale
pair)
Netflix Image Library
Device (PS3, website, etc.)
Feedback loop
Serve artwork
based on A/B logic
Feed with artwork
based on perf
metric
Collect plays &
client impressions
Anatomy of artwork
Stable Image id for ground truth data
source-file-id-1 source-file-id-3source-file-id-2
Lineage-id-1
Diversity matters
Diversity matters
Pop Quiz
1 2
4 5 6
3
Building the A/B tests
vs.
Pairs of Explore and Exploit Tests
Explore Test
Current production
explore
New explore
Exploit Test
Current production
exploit
New exploit
Winner
Winner
● No member overlap
● Explore and exploit allocation happens
simultaneously
Multi-title explore allocation test
Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6
Title 1
Control
Image
Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5
Title 2
Control
Image
Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5
... ... ... ... ... ... ...
Title n
Control
Image
Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5
Test Evolution: Single Title to Multiple Titles
Single title, multi-cell test
Engineering implementation / complexity
• Our A/B infrastructure is optimized for comparing test cells to each other
• Need to compare data across cells for one title of many
• Avoid creating hundreds of tests (one per title)
Solution:
• Treat all the members who see a title’s images as a virtual test
• Impression tracking -- not just test cell allocation -- defines test population per
title
Engineering implementation / complexity
Allocated
Members
Title A
impres-
sions
Title B
impres-
sions
Problems with multi-title, multi-cell test
• Cohorts of testers who all saw the same set of images
• Same number of images for every title
Single-cell explore allocation test
Title 1
“Cells” 1 2 3 4 5 6
Image Control Image 1 Image 2 Image 3 Image 4 Image 5
Title 2
“Cells” 1 2 3 4
Image Control Image 1 Image 2 Image 3
Test Evolution: Images per title
Multi-cell explore evolves to Single-cell explore
Devolves?
Virtual Tests inside one test cell
Engineering implementation / complexity
Goals
• No cohorts
• Image stickiness
• No persistent storage
We used a deterministic, pseudo-random calculation
• new Random(memberID * titleId).nextInt(numImages)
Netflix API Service
Engineering implementation / complexity
No persistence neededCells Cell 1 Cell 2
Title 1
Ctrl Image Random of [Ctrl, Test 1, ... Test X1]
Title 2
Ctrl Image Random of [Ctrl, Test 1, ... Test X2]
... ... ...
Title n Ctrl Image
Random of [Ctrl, Test 1, ... Test Xn]
Image
Data
Feed
(Title ID,
Image Lists)
Netflix Image Lib.
Random assignment to
all test members.
Single-cell explore test
● No more cohorts
● Flexible
● Clear winners for many titles
● Overall win based on key metrics
Can we do better?
Result
Problems
• Over exposure of under-performing images
• Under exposure of niche titles
• Unfair burden on testers
Title-level allocation test
Solution: Title-Level Allocation
• Limit allocated members per title
• Less exposure of under-performing images
• Still get enough data to determine winner
• Allocate from a gigantic pool
• More exposure for niche titles
• Spreads testing burden
Test Evolution: Testers per title
C
Title A
Title B
Title C
Title A
Title B
● Some titles have few testers
in the small pool
● Most titles have full testing
allocation from larger pool
Engineering implementation / complexity
• Goals from previous test
• No cohorts
• Image stickiness
• No persistent storage
• New goals
• Less exposure for under-performing images
• More exposure for niche titles
• Faster decision and rollout of winning images
• This time, we needed to persist the allocations
Netflix API Service
Architecture
Image
Data
Feed
Yellow
Square
(Y2)
Netflix Image Library
Member
Allocated
?
Title fully
Allocated
?
Allocate with Random
Assignment
Log and store
Allocation
Select
Assigned Image
Select
Control Image
Select
Assigned Image
No
No
Yes
Yes
Title
Metadata
Service
(VMS)
Kafka
Oops
● Underestimated traffic
● Many titles allocated per member at once
● Write to Y2 for every allocation
Result: Service disruption; we had to turn off the test
Netflix API Service
Scaling
Image
Data
Feed
Yellow
Square
(Y2)
Netflix Image Library
Allocate with Random
Assignment
Log and store
Allocation
Kafka
Stream
Processor
1 write per member
every 30 sec.
Storing allocations as they
occurred overloaded Yellow
Square.
Now, we log them to a stream and
consolidate many writes into one.
Who to Test on?
Test on the same population you are
planning to rollout the changes to
Two Member Cohorts
• New Members are assigned to the experimental condition at the time
of sign-up
• Existing Members are assigned to the experimental condition any
time after free trial ended
Decision Focuses More on New Members
• A “pure” sample which is not tainted by a previous Netflix experience
• A more sensitive sample (“on the fence”)
Tiers of Metrics
• Primary: Customer retention
• Secondary: Streaming hours
• Tertiary: all other customer engagement metrics
• Play rate
• Number of Netflix visits
• ...
How to Pick the Winner in Explore?
• Take fraction = (number of users played the title) /
(number of users been seen the title)
• Correlated with retention
• Measurable from day one
What is a Play?
What is a Play?
What is a Play?
Does Impression Location Matter?
Does Impression Location Matter?
Does Impression Location Matter?
Does it Matter How Many Impressions it Takes to
Play?
Netflix just
recommended an
awesome show to
me and I am going to
watch it!!!
Does it Matter How Many Impressions it Takes to
Play?
I have seen the
show on Netflix a
few times. Maybe,
I should try it...
Take Fraction is NOT as trivial
as its definition implies.
How to Make the Final Decision?
Final decision is based on the exploit test
• Retention movement
• Streaming hours movement
• Engagement with titles explored in the test, titles not
explored in the test
• ….
Our Image Selection Test is a Win!
• Improved customer retention
• Improved customer engagement
Some Learnings
Emotions excellent to convey complex nuances
Great stories travel - but regional nuances can be powerful
Nice Guys Often Finish Last
Contact:
Gopal Krishnan
Dale Elliott
Kenny Xie
More details available at Netflix
techblog.
Talk to us outside at the booth.

More Related Content

Recently uploaded

Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfNainaShrivastava14
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsapna80328
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 

Recently uploaded (20)

Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveying
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

  • 1. Improving the power of a picture via A/B testing Gopal Krishnan Director of Engineering Dale Elliott Senior Software Engineer Kenny Xie Senior Data Scientist
  • 2.
  • 3.
  • 4.
  • 5. TV is a lean back experience
  • 7.
  • 9. A round plane figure whose boundary (the circumference) consists of points equidistant from a fixed point (the center).
  • 10.
  • 11. A round plane figure whose boundary (the circumference) consists of points equidistant from a fixed point (the center).
  • 12.
  • 13. Can we do better?
  • 14.
  • 17. Single title A/B test result 14% better 6% better
  • 19. Displaying better artwork will result in greater engagement and retention by helping members discover stories they will enjoy even faster.
  • 21. Netflix API service Beacon (telemetry collection service) Hive (computes artwork performance metrics for every title/country/locale pair) Netflix Image Library Device (PS3, website, etc.) Feedback loop Serve artwork based on A/B logic Feed with artwork based on perf metric Collect plays & client impressions
  • 23. Stable Image id for ground truth data source-file-id-1 source-file-id-3source-file-id-2 Lineage-id-1
  • 26. Pop Quiz 1 2 4 5 6 3
  • 27. Building the A/B tests vs.
  • 28. Pairs of Explore and Exploit Tests Explore Test Current production explore New explore Exploit Test Current production exploit New exploit Winner Winner ● No member overlap ● Explore and exploit allocation happens simultaneously
  • 30. Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Title 1 Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5 Title 2 Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5 ... ... ... ... ... ... ... Title n Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5 Test Evolution: Single Title to Multiple Titles Single title, multi-cell test
  • 31. Engineering implementation / complexity • Our A/B infrastructure is optimized for comparing test cells to each other • Need to compare data across cells for one title of many • Avoid creating hundreds of tests (one per title)
  • 32. Solution: • Treat all the members who see a title’s images as a virtual test • Impression tracking -- not just test cell allocation -- defines test population per title Engineering implementation / complexity Allocated Members Title A impres- sions Title B impres- sions
  • 33. Problems with multi-title, multi-cell test • Cohorts of testers who all saw the same set of images • Same number of images for every title
  • 35. Title 1 “Cells” 1 2 3 4 5 6 Image Control Image 1 Image 2 Image 3 Image 4 Image 5 Title 2 “Cells” 1 2 3 4 Image Control Image 1 Image 2 Image 3 Test Evolution: Images per title Multi-cell explore evolves to Single-cell explore Devolves? Virtual Tests inside one test cell
  • 36. Engineering implementation / complexity Goals • No cohorts • Image stickiness • No persistent storage We used a deterministic, pseudo-random calculation • new Random(memberID * titleId).nextInt(numImages)
  • 37. Netflix API Service Engineering implementation / complexity No persistence neededCells Cell 1 Cell 2 Title 1 Ctrl Image Random of [Ctrl, Test 1, ... Test X1] Title 2 Ctrl Image Random of [Ctrl, Test 1, ... Test X2] ... ... ... Title n Ctrl Image Random of [Ctrl, Test 1, ... Test Xn] Image Data Feed (Title ID, Image Lists) Netflix Image Lib. Random assignment to all test members. Single-cell explore test
  • 38. ● No more cohorts ● Flexible ● Clear winners for many titles ● Overall win based on key metrics Can we do better? Result
  • 39. Problems • Over exposure of under-performing images • Under exposure of niche titles • Unfair burden on testers
  • 41. Solution: Title-Level Allocation • Limit allocated members per title • Less exposure of under-performing images • Still get enough data to determine winner • Allocate from a gigantic pool • More exposure for niche titles • Spreads testing burden
  • 42. Test Evolution: Testers per title C Title A Title B Title C Title A Title B ● Some titles have few testers in the small pool ● Most titles have full testing allocation from larger pool
  • 43. Engineering implementation / complexity • Goals from previous test • No cohorts • Image stickiness • No persistent storage • New goals • Less exposure for under-performing images • More exposure for niche titles • Faster decision and rollout of winning images • This time, we needed to persist the allocations
  • 44. Netflix API Service Architecture Image Data Feed Yellow Square (Y2) Netflix Image Library Member Allocated ? Title fully Allocated ? Allocate with Random Assignment Log and store Allocation Select Assigned Image Select Control Image Select Assigned Image No No Yes Yes Title Metadata Service (VMS) Kafka
  • 45. Oops ● Underestimated traffic ● Many titles allocated per member at once ● Write to Y2 for every allocation Result: Service disruption; we had to turn off the test
  • 46. Netflix API Service Scaling Image Data Feed Yellow Square (Y2) Netflix Image Library Allocate with Random Assignment Log and store Allocation Kafka Stream Processor 1 write per member every 30 sec. Storing allocations as they occurred overloaded Yellow Square. Now, we log them to a stream and consolidate many writes into one.
  • 47.
  • 48. Who to Test on? Test on the same population you are planning to rollout the changes to
  • 49. Two Member Cohorts • New Members are assigned to the experimental condition at the time of sign-up • Existing Members are assigned to the experimental condition any time after free trial ended
  • 50. Decision Focuses More on New Members • A “pure” sample which is not tainted by a previous Netflix experience • A more sensitive sample (“on the fence”)
  • 51. Tiers of Metrics • Primary: Customer retention • Secondary: Streaming hours • Tertiary: all other customer engagement metrics • Play rate • Number of Netflix visits • ...
  • 52. How to Pick the Winner in Explore? • Take fraction = (number of users played the title) / (number of users been seen the title) • Correlated with retention • Measurable from day one
  • 53. What is a Play?
  • 54. What is a Play?
  • 55. What is a Play?
  • 59. Does it Matter How Many Impressions it Takes to Play? Netflix just recommended an awesome show to me and I am going to watch it!!!
  • 60. Does it Matter How Many Impressions it Takes to Play? I have seen the show on Netflix a few times. Maybe, I should try it...
  • 61. Take Fraction is NOT as trivial as its definition implies.
  • 62. How to Make the Final Decision? Final decision is based on the exploit test • Retention movement • Streaming hours movement • Engagement with titles explored in the test, titles not explored in the test • ….
  • 63. Our Image Selection Test is a Win! • Improved customer retention • Improved customer engagement
  • 65. Emotions excellent to convey complex nuances
  • 66. Great stories travel - but regional nuances can be powerful
  • 67. Nice Guys Often Finish Last
  • 68. Contact: Gopal Krishnan Dale Elliott Kenny Xie More details available at Netflix techblog. Talk to us outside at the booth.