SlideShare une entreprise Scribd logo
1  sur  50
A/B Testing: Avoiding
Common Pitfalls
Danielle Jabin

March 5, 2014
1

Make all the world’s music
available instantly to everyone,
wherever and whenever they
want it
2

As of March 5, 2014
3

Over 24 million active users

As of March 5, 2014
4

Access to more than 20 million
songs

As of March 5, 2014
5
6

But can we make it even
easier?
7

We can try…
…with A/B testing!
8

So…what’s an A/B test?
9

Control

A
Pitfall #1: Not limiting
your error rate
11

Source: assets.20bits.com/20081027/normal-curve-small.png
12

What if I flip a coin 100 times and get 51 heads?
13

What if I flip a coin 100 times and get 5 heads?
14
15

The likelihood of obtaining a
certain value under a given
distribution is measured by its
p-value
16

If there is a low likelihood that a
change is due to chance alone,
we call our results statistically
significant
17

What if I flip a coin 100 times and get 5 heads?
18

Statistical significance is measured by alpha
● alpha levels of 5% and 1% are most commonly used
– Alternatively: P(significant) = .05 or .01
19

Each alpha has a corresponding Z-score
alpha

Z-score (two-sided test)

.10

1.65

.05

1.96

.01

2.58
20

The Z-score tells us how far a
particular value is from the
mean (and what the corresponding
likelihood is)
21

Source: assets.20bits.com/20081027/normal-curve-small.png
22

Compute the Z-score at the end of the test
23

Standard deviation (σ) tells us
how spread out the numbers
are
24
25

To lock in error rates before you
start, fix your sample size
26

What should my sample size be?
● To lock in error rates before you start a test, fix your sample size
Represents the
desired power
(typically .84 for 80%
power).

Sample size in each
group (assumes equal
sized groups)

2s (Z b + Za /2 )
n=
2
difference Represents the desired
2

Standard deviation of
the outcome variable

Source: www.stanford.edu/~kcobb/hrp259/lecture11.ppt

2

Effect Size (the
difference in
means)

level of statistical
significance (typically
1.96).
27

Recap: running an A/B test
● Compute your sample size
– Using alpha, beta, standard deviation of your metric, and effect size
● Run your test! But stop once you’ve reached the fixed sample size stopping point
● Compute your z-score and compare it with the z-score for the chosen alpha level
28

Control

A
29

Resulting Z-score?
30

33.3
Pitfall #2: Stopping your
test before the fixed
sample size stopping
point
32

Sample size for varying alpha levels
● With σ = 10, difference in means = 1

Two-sided test
alpha = .10, beta = .80

1230

alpha = .05, beta = .80

1568

alpha = .01, beta = .80

2339
33

Let’s see some numbers
● 1,000 experiments with 200,000 fake participants divided randomly into two groups both
receiving the exact same version, A, with a 3% conversion rate

90% significance
reached
95% significance
reached
99% significance
reached
Source: destack.home.xs4all.nl/projects/significance/

Stop at first point of
significance
654 of 1,000

Ended as significant

427 of 1,000

49 of 1,000

146 of 1,000

14 of 1,000

100 of 1,000
34

Remedies
● Don’t peek
● Okay, maybe you can peek, but don’t stop or make a decision before you reach the fixed
sample size stopping point
● Sequential sampling
35

Control

A

B
Pitfall #3: Making
multiple comparisons in
one test
37

A test can be one of two things: significant or not significant
● P(significant) + P(not significant) = 1
● Let’s take an alpha of .05
– P(significant) = .05
– P(not significant) = 1 – P(significant) = 1 - .05 = .95
38

What about for two comparisons?
● P(at least 1 significant) = 1 - P(none of the 2 are significant)
● P(none of the 2 are significant) = P(not significant)*P(not significant) = .95*.95 = .9025
● P(at least 1 significant) = 1 - .9025 = .0975
39

What about for two comparisons?

●That’s almost 2x (1.95x, to be precise) your .05
significance rate!
40

And it just gets worse…
P(at least 1 signifcant)

An increase of…

5 variations

1 – (1-.05)^5 = .23

4.6x

10 variations

1 – (1-.05)^10 = .40

8x

20 variations

1 – (1-.05)^20 = .64

12.8x
41

How can we remedy this?
●Bonferroni correction
– Divide P(significant), your alpha, by the number of variations you are testing, n
– alpha/n becomes the new level of statistical significance
42

So what about two comparisons now?
● Our new P(significant) = .05/2 = .025
● Our new P(not significant) = 1 - .025 = .975
● P(at least 1 significant) = 1 - P(none of the 2 are significant)
● P(none of the 2 are significant) = P(not significant)*P(not significant) = .975*.975 = .951
● P(at least 1 significant) = 1 - .951 = .0499
43

P(significant) stays under .05 
Corrected alpha

P(at least 1 signifcant)

5 variations

.05/5 = .01

1 – (1-.01)^5 = .049

10 variations

.05/10 = .005

1 – (1-.005)^10 = .049

20 variations

.05/20 = .0025

1 – (1-.0025)^20 = .049
Questions?
Appendix
46

A/B test steps:
1. Decide what to test
2. Determine a metric to test
3. Formulate your hypothesis
1. Select an effect size threshold: what change of the metric would make a rollout
worthwhile?
4. Calculate sample size (your stopping point)
1. Decide your Type I (alpha) and Type 2 (beta) error levels and the corresponding zscores
2. Determine the standard deviation of your metric
5. Run your test! But stop once you’ve reached the fixed sample size stopping point
6. Compute your z-score and compare it with the z-score for your chosen alpha level
47

Type I and Type II error
● Type I error: incorrectly reject a true null hypothesis
– alpha
● Type II error: incorrectly accept a false null hypothesis
– beta
– Power: 1 - beta
48

Z-score reference table
alpha

One-sided test

Two-sided test

.10

1.28

1.65

.05

1.65

1.96

.01

2.33

2.58
49

Z-score for proportions (e.g. conversion)

Contenu connexe

Tendances

Talks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet ScaleTalks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet Scalecourseratalks
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at ScaleMounia Lalmas-Roelleke
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerProduct School
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at SpotifyRohan Agrawal
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018Fabien Gouyon
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101Ashish Dua
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyChris Johnson
 
Evidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve BlankEvidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve BlankLean Startup Co.
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsRamkumar Ravichandran
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B TestingAlex Alwan
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ SpotifyOscar Carlsson
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 

Tendances (20)

Talks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet ScaleTalks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet Scale
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product Manager
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Evidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve BlankEvidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve Blank
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 

En vedette

4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B TestingJanessa Lantz
 
Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes TomorrowDanielle Jabin
 
Measuring team performance at spotify slideshare
Measuring team performance at spotify slideshareMeasuring team performance at spotify slideshare
Measuring team performance at spotify slideshareDanielle Jabin
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015Danielle Jabin
 
The math-behind-ab-testing
The math-behind-ab-testingThe math-behind-ab-testing
The math-behind-ab-testingAmit Sawhney
 
Managing Experiment at Spotify
Managing Experiment at SpotifyManaging Experiment at Spotify
Managing Experiment at SpotifyAli Sarrafi
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Explore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataExplore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataBen Dressler
 
Being a Data Driven Business
Being a Data Driven Business Being a Data Driven Business
Being a Data Driven Business Ali Sarrafi
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Peter Antman
 
An Intro to Learning Organization
An Intro to Learning OrganizationAn Intro to Learning Organization
An Intro to Learning OrganizationHakan Cuzdan
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean StartupsPete Mauro
 
Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Peter Antman
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionKissmetrics on SlideShare
 
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)Jin Young Kim
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotifypdicorpo
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleCraig Sullivan
 
Testing a 2D Platformer with Spock
Testing a 2D Platformer with SpockTesting a 2D Platformer with Spock
Testing a 2D Platformer with SpockAlexander Tarlinder
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
 

En vedette (20)

4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
 
Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes Tomorrow
 
Measuring team performance at spotify slideshare
Measuring team performance at spotify slideshareMeasuring team performance at spotify slideshare
Measuring team performance at spotify slideshare
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015
 
The math-behind-ab-testing
The math-behind-ab-testingThe math-behind-ab-testing
The math-behind-ab-testing
 
Managing Experiment at Spotify
Managing Experiment at SpotifyManaging Experiment at Spotify
Managing Experiment at Spotify
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Explore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataExplore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and data
 
Being a Data Driven Business
Being a Data Driven Business Being a Data Driven Business
Being a Data Driven Business
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 
An Intro to Learning Organization
An Intro to Learning OrganizationAn Intro to Learning Organization
An Intro to Learning Organization
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean Startups
 
Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong Direction
 
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotify
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype Cycle
 
Testing a 2D Platformer with Spock
Testing a 2D Platformer with SpockTesting a 2D Platformer with Spock
Testing a 2D Platformer with Spock
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
 

Similaire à A/B Testing Pitfalls and Lessons Learned at Spotify

Quantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & TricksQuantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & Trickscocubes_learningcalendar
 
Quantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & TricksQuantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & Tricksshwetavashishtha
 
Uji liliefors
Uji lilieforsUji liliefors
Uji lilieforsjojun
 
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxWEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxcockekeshia
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments Furk Kruf
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionRupak Roy
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeAmber Bhaumik
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622Dexen Xi
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonTom Capper
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference BerlinTom Capper
 
6.7 Percent Equations
6.7 Percent Equations6.7 Percent Equations
6.7 Percent EquationsJessca Lundin
 
Lecture_Wk08.pdf
Lecture_Wk08.pdfLecture_Wk08.pdf
Lecture_Wk08.pdfNiel89
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsPrincessNorberte
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamBrent Heard
 

Similaire à A/B Testing Pitfalls and Lessons Learned at Spotify (20)

Quantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & TricksQuantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & Tricks
 
Quantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & TricksQuantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & Tricks
 
Uji liliefors
Uji lilieforsUji liliefors
Uji liliefors
 
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxWEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
 
Quant tips edited
Quant tips editedQuant tips edited
Quant tips edited
 
7. the t distribution
7. the t distribution7. the t distribution
7. the t distribution
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
MATHS
MATHSMATHS
MATHS
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative Aptitude
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference London
 
01_SLR_final (1).pptx
01_SLR_final (1).pptx01_SLR_final (1).pptx
01_SLR_final (1).pptx
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference Berlin
 
6.7 Percent Equations
6.7 Percent Equations6.7 Percent Equations
6.7 Percent Equations
 
Lecture_Wk08.pdf
Lecture_Wk08.pdfLecture_Wk08.pdf
Lecture_Wk08.pdf
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcuts
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
 

Dernier

International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 

Dernier (20)

International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 

A/B Testing Pitfalls and Lessons Learned at Spotify

  • 1. A/B Testing: Avoiding Common Pitfalls Danielle Jabin March 5, 2014
  • 2. 1 Make all the world’s music available instantly to everyone, wherever and whenever they want it
  • 3. 2 As of March 5, 2014
  • 4. 3 Over 24 million active users As of March 5, 2014
  • 5. 4 Access to more than 20 million songs As of March 5, 2014
  • 6. 5
  • 7. 6 But can we make it even easier?
  • 8. 7 We can try… …with A/B testing!
  • 11. Pitfall #1: Not limiting your error rate
  • 13. 12 What if I flip a coin 100 times and get 51 heads?
  • 14. 13 What if I flip a coin 100 times and get 5 heads?
  • 15. 14
  • 16. 15 The likelihood of obtaining a certain value under a given distribution is measured by its p-value
  • 17. 16 If there is a low likelihood that a change is due to chance alone, we call our results statistically significant
  • 18. 17 What if I flip a coin 100 times and get 5 heads?
  • 19. 18 Statistical significance is measured by alpha ● alpha levels of 5% and 1% are most commonly used – Alternatively: P(significant) = .05 or .01
  • 20. 19 Each alpha has a corresponding Z-score alpha Z-score (two-sided test) .10 1.65 .05 1.96 .01 2.58
  • 21. 20 The Z-score tells us how far a particular value is from the mean (and what the corresponding likelihood is)
  • 23. 22 Compute the Z-score at the end of the test
  • 24. 23 Standard deviation (σ) tells us how spread out the numbers are
  • 25. 24
  • 26. 25 To lock in error rates before you start, fix your sample size
  • 27. 26 What should my sample size be? ● To lock in error rates before you start a test, fix your sample size Represents the desired power (typically .84 for 80% power). Sample size in each group (assumes equal sized groups) 2s (Z b + Za /2 ) n= 2 difference Represents the desired 2 Standard deviation of the outcome variable Source: www.stanford.edu/~kcobb/hrp259/lecture11.ppt 2 Effect Size (the difference in means) level of statistical significance (typically 1.96).
  • 28. 27 Recap: running an A/B test ● Compute your sample size – Using alpha, beta, standard deviation of your metric, and effect size ● Run your test! But stop once you’ve reached the fixed sample size stopping point ● Compute your z-score and compare it with the z-score for the chosen alpha level
  • 32. Pitfall #2: Stopping your test before the fixed sample size stopping point
  • 33. 32 Sample size for varying alpha levels ● With σ = 10, difference in means = 1 Two-sided test alpha = .10, beta = .80 1230 alpha = .05, beta = .80 1568 alpha = .01, beta = .80 2339
  • 34. 33 Let’s see some numbers ● 1,000 experiments with 200,000 fake participants divided randomly into two groups both receiving the exact same version, A, with a 3% conversion rate 90% significance reached 95% significance reached 99% significance reached Source: destack.home.xs4all.nl/projects/significance/ Stop at first point of significance 654 of 1,000 Ended as significant 427 of 1,000 49 of 1,000 146 of 1,000 14 of 1,000 100 of 1,000
  • 35. 34 Remedies ● Don’t peek ● Okay, maybe you can peek, but don’t stop or make a decision before you reach the fixed sample size stopping point ● Sequential sampling
  • 37. Pitfall #3: Making multiple comparisons in one test
  • 38. 37 A test can be one of two things: significant or not significant ● P(significant) + P(not significant) = 1 ● Let’s take an alpha of .05 – P(significant) = .05 – P(not significant) = 1 – P(significant) = 1 - .05 = .95
  • 39. 38 What about for two comparisons? ● P(at least 1 significant) = 1 - P(none of the 2 are significant) ● P(none of the 2 are significant) = P(not significant)*P(not significant) = .95*.95 = .9025 ● P(at least 1 significant) = 1 - .9025 = .0975
  • 40. 39 What about for two comparisons? ●That’s almost 2x (1.95x, to be precise) your .05 significance rate!
  • 41. 40 And it just gets worse… P(at least 1 signifcant) An increase of… 5 variations 1 – (1-.05)^5 = .23 4.6x 10 variations 1 – (1-.05)^10 = .40 8x 20 variations 1 – (1-.05)^20 = .64 12.8x
  • 42. 41 How can we remedy this? ●Bonferroni correction – Divide P(significant), your alpha, by the number of variations you are testing, n – alpha/n becomes the new level of statistical significance
  • 43. 42 So what about two comparisons now? ● Our new P(significant) = .05/2 = .025 ● Our new P(not significant) = 1 - .025 = .975 ● P(at least 1 significant) = 1 - P(none of the 2 are significant) ● P(none of the 2 are significant) = P(not significant)*P(not significant) = .975*.975 = .951 ● P(at least 1 significant) = 1 - .951 = .0499
  • 44. 43 P(significant) stays under .05  Corrected alpha P(at least 1 signifcant) 5 variations .05/5 = .01 1 – (1-.01)^5 = .049 10 variations .05/10 = .005 1 – (1-.005)^10 = .049 20 variations .05/20 = .0025 1 – (1-.0025)^20 = .049
  • 47. 46 A/B test steps: 1. Decide what to test 2. Determine a metric to test 3. Formulate your hypothesis 1. Select an effect size threshold: what change of the metric would make a rollout worthwhile? 4. Calculate sample size (your stopping point) 1. Decide your Type I (alpha) and Type 2 (beta) error levels and the corresponding zscores 2. Determine the standard deviation of your metric 5. Run your test! But stop once you’ve reached the fixed sample size stopping point 6. Compute your z-score and compare it with the z-score for your chosen alpha level
  • 48. 47 Type I and Type II error ● Type I error: incorrectly reject a true null hypothesis – alpha ● Type II error: incorrectly accept a false null hypothesis – beta – Power: 1 - beta
  • 49. 48 Z-score reference table alpha One-sided test Two-sided test .10 1.28 1.65 .05 1.65 1.96 .01 2.33 2.58
  • 50. 49 Z-score for proportions (e.g. conversion)