A compendium of the most common mistakes and problems people encounter when trying to optimise or split test cross device experiences (mobile, tablet, desktop, app, tv etc.)
Myths and Illusions of Cross Device Testing - Elite Camp June 2015
1. The Myths, Lies and Illusions
of Cross Device Testing
Craig Sullivan, Optimiser of Everything, @OptimiseOrDie
2. @OptimiseOrDi
e
• Split Testing, Analytics, UX, Agile, Lean, Growth
• 50M+ visitors tested, 19 languages, over 200 sites
• 70+ mistakes I’ve made personally during testing
• Like riding a bike, really really badly…
• Optimise or Kickstart your programme?
• Get in touch!
9. Cross Device Testing Myths
1. Responsive solves everything
2. All our customers are on iPhones, right?
3. The customer journey is in your head
4. You don’t integrate with analytics
5. You think you’re tracking people
6. You only imagine the context
7. We think we have a hypothesis thingy
8. You think best practice is other tests
9. You just start testing, right?
10. What you see is what you get
11. 95% confidence is enough for me!
12. You always get the promised lift
13. Segmentation is too hard
14. Who cares if it’s a phone?
15. Testing makes you a data scientist
@OptimiseOrDie
18. 1. Motorola Hardware Menu Button
2. MS Word Bullet Button
3. Android Holo Composition Icon
4. Android Context Action Bar Overflow (top right on Android devices)
@OptimiseOrDie
20. Increase in revenue of > $200,000 per annum!
bit.ly/hamburgertest
@OptimiseOrDie
21. 2: All our Customers use iPhones, right?
• Most common answer is
“iPhones and iPads”
• Do you really know your mix?
• Most people undercount Android
• Use Google Analytics to find out
• Replace guesswork with truth!
• 2-3 hours work only
• Get your top testing mix right on:
Desktop, Tablet, Mobile
• I’m writing an article – why?
@OptimiseOrDie
25. 2: Browser reports
@OptimiseOrDie
• Be very careful when doing numbers on desktop
browsers, tablet or mobile devices
• Chrome and Safari are on auto upgrades
• All browser ‘upgrade’ at different speeds
• Chrome and Firefox are fragmented as a reason
• Same for mobile – there are thousands of Android
handsets makers but the models are similar
• When looking at apple, you need to split by model
(you can use the resolution to figure this out).
• iPads you can’t distinguish in GA*
• Is the analysis you did really true?
• Cluster your devices or browsers where needed!
• Article is coming!
29. 3: The Customer Journey is in your Head
• The routes people take are not what we expect.
• Analytics data and Usability research are big pointers!
• Most common problem is the team not owning,
experiencing and being immersed in the problems
with your key journeys
• One charity wasted nearly 0.5M on a poor pathway
• Can you imagine someone from Mcdonalds never
visiting their stores?
“So that’s how
it looks on
mobile!”
@OptimiseOrDie
30. 3: Customer Journey - Solutions
"Great user experiences happen
beyond the screen and in the gaps.“
Paul Boag
• Test ALL key campaigns
• Use Real Devices
• Get your own emails
• Order your own products
• Call the phone numbers
• Send an email
• Send stuff back
• Be difficult
• Break things
• Experience the end-end
• Team are ALL mystery shoppers
• Wear the magical slippers of the actual customer experience!
• Be careful about dogfood though!
@OptimiseOrDie
31. • Investigating problems with tests
• Tests that fail, flip or move around
• Tests that don’t make sense
• Broken test setups
• Segmenting by Mobile, Tablet,
Desktop
• Other customer segments
• What drives the averages you see?
4: Our AB testing tool tells us all we need…
32. 5: You think you’re tracking people
Basket
Shipping
Details
Pay
Product
Basket
Shipping
Details
Pay
Mobile Desktop
@OptimiseOrDie
33. 5: You think you’re tracking people
• Keep people logged in
• Use Social Logins
• Identify unique customers
• Feed this data to Universal Analytics
• Follow users, not just device experiences
• It’s an attribution problem!
@OptimiseOrDie
34. 5: Get a User ID View with Google Analytics
@OptimiseOrDie
35. 5: Get a User ID View with Google Analytics
@OptimiseOrDie
36. 6: You only imagine the context
• Tasks
• Goals
• Device
• Location
• Data rate
• Viewport
• Urgency
• Motivation
• Data costs
• Call costs
• Weather!
@OptimiseOrDie
37. 6: You only imagine the context - solutions
bit.ly/multichannels
@OptimiseOrDie
38. 7: Other people’s tests are Best Practice
“STOP copying your competitors
They may not know what the
f*** they are doing either”
Peep Laja, ConversionXL
@OptimiseOrDie
39. Best Practice Testing?
• Your customers are not the same
• Your site is not the same
• Your advertising and traffic is not the same
• Your UX is not the same
• Your X-Device Mix is not the same
• Use them to inform or suggest approaches
• They’re like the picture on meal packets
• Serving Suggestion Only
• There are obvious BEST PRACTICES but these are usually in the
category of ‘bugs’ or ‘UX problems’
Just fix
that now!
We have
no clue
@OptimiseOrDie
40. Best Practice Testing?
“The Endless Suck of Best Practice
and Optimisation Experts”
bit.ly/socalledexperts
@OptimiseOrDie
41. Insight - Inputs
#FAIL
Competitor
copying
Guessing
Dice rolling
An article
the CEO
read
Competitor
change
Panic
Ego
Opinion
Cherished
notions
Marketing
whims Cosmic rays
Not ‘on
brand’
enough
IT
inflexibility
Internal
company
needs
Some
dumbass
consultant
Shiny
feature
blindness
Knee jerk
reactons
8: You think you have a Hypothesis!
@OptimiseOrDie
42. Insight - Inputs
Insight
Segmentation
Surveys
Sales and
Call Centre
Session
Replay
Social
analytics
Customer
contact
Eye tracking
Usability
testing
Forms
analytics
Search
analytics Voice of
Customer
Market
research
A/B and
MVT testing
Big &
unstructured
data
Web
analytics
Competitor
evalsCustomer
services
8: These are inputs you need…
@OptimiseOrDie
43. Because we observed data [A] and feedback
[B],
We believe that doing [C] for People [D]
will make outcome [E] happen.
We’ll know this when we observe data [F]
and obtain feedback [G].
(Reverse this)
@OptimiseOrDie
44. Because our CEO had an idea, that nobody
else agreed with:
We believe that putting Orange Buttons on
our Homepage will make people feel ‘Funkier’
We’ll know this when…
@OptimiseOrDie
45. Baseline Checks
Analytics health check
Developer onboarding
Goals & Metrics
Tool Setup
Analytics & Modelling
Discover ideas
Prioritise
Test cycles
9: You just start testing, right?
@OptimiseOrDie
49. 10: What you see is what you get…
@OptimiseOrDie
50. 11: You just stop at 95% confidence,
right?
@OptimiseOrDie
51. The 95% Stopping Problem
• Many people use 95, 99% ‘confidence’ to stop
• This value is unreliable and moves around
• Nearly all my tests reach significance before they are
actually ready
• You can hit 95% early in a test (18 minutes!)
• If you stop, it could be a false result
• Read this Nature article : bit.ly/1dwk0if
• Optimizely have changed their stats engine
• This 95% thingy – is the cherry on the cake!
• Let me explain
@OptimiseOrDie
52. The 95% Stopping Problem
Scenario 1 Scenario 2 Scenario 3 Scenario 4
After 200
observations
Insignificant Insignificant Significant! Significant!
After 500
observations
Insignificant Significant! Insignificant Significant!
End of
experiment
Insignificant Significant! Insignificant Significant!
“You should know that stopping a test once it’s significant is deadly sin
number 1 in A/B testing land. 77% of A/A tests (testing the same thing
as A and B) will reach significance at a certain point.”
Ton Wesseling, Online Dialogue
@OptimiseOrDie
53. The 95% Stopping Problem
“Statistical Significance does not equal Validity”
http://bit.ly/1wMfmY2
“Why every Internet Marketer should be a Statistician”
http://bit.ly/1wMfs1G
“Understanding the Cycles in your site”
http://mklnd.com/1pGSOUP
@OptimiseOrDie
54. Business & Purchase Cycles
• Customers change
• Your traffic mix changes
• Markets, competitors
• Be aware of all the waves
• Always test whole cycles
• Don’t exclude slower buyers
• When you stop, let test
subjects still complete!
Start Test Finish Avg Cycle
@OptimiseOrDie
55. • TWO BUSINESS CYCLES minimum (week/mo)
• 1 PURCHASE CYCLE minimum
• 250 CONVERSIONS minimum per creative
• 350 & MORE! it depends on response
• FULL WEEKS/CYCLES never part of one
• KNOW what marketing, competitors and cycles are doing
• RUN a test length calculator - bit.ly/XqCxuu
• SET your test run time , RUN IT, STOP IT, ANALYSE IT
• ONLY RUN LONGER if you need more data
• DON’T RUN LONGER just because the test isn’t giving the result
you want!
How Long to Run My Test and When to Stop
@OptimiseOrDie
56. 12: The test result gives the promised lift
@OptimiseOrDie
57. The result is a range
• Version A is 3% conversion
• Version B is 4% conversion
• Yay! That’s a 25% lift
• Let’s tell everyone
• When it goes live, you do NOT get 25%
• That’s because it was A RANGE
• 3% +/- 0.5 (could be 2.5-3.5)
• 4% +/- 0.4 ( could be 3.6-4.4)
• Actual result was 3.5% for A
• Actual result was 3.7% for B @OptimiseOrDie
59. Always Segment
Experiences
• If you segment by devices, the sample gets smaller.
• A = 350 conversions
• B = 300 conversions
• Desktop A 200, Tablet A 100, Mobile A 50
• Desktop B 180, Tablet B 80, Mobile B 40
• It’s vital to segment by device class
• You may also segment by breakpoint, viewport or model
• Make sure you know the proportion of devices!
• If you want to analyse, plan ahead!
@OptimiseOrDie
60. 13: Wow – the browser is a phone too!
• Add call tracking
• Buy a solution or
• Make your own!
• Measure calls
• ROI on phone mix
• Vital for PPC
• Explain the costs or
• Make it free
61. • Responsive solves everything No, it’s just an attribute
• All our customers are on iPhones, right? Make sure you know what they use!
• The customer journey is in your head Customer Insight, Research, Data
• You don’t integrate with analytics Use analytics, not the AB test data
• You think you’re tracking people Have an authentication strategy
• You only imagine the context Customer Insight, Diary Studies
• We think we have a hypothesis thingy Challenge all work with my outline
• You think best practice is other tests Leverage your customers, not theirs
14: SUMMARY
@OptimiseOrDie
62. • You just start testing, right? Preparation, Methodology, Prioritisation
• What you see is what you get QA testing with your Customer Shizzle
• 95% confidence is enough for me! Don’t stop tests when they hit 95%
• You always get the promised lift Quote ranges, not predictions
• Segmentation is too hard Segmentation – watch sample sizes
• Who cares if it’s a phone? Add call tracking or add ‘Tap 2 Call’
• Testing makes you a data scientist No, it doesn’t – it makes you humble
14: SUMMARY
@OptimiseOrDie
63. Testing makes you all Data Scientists
@OptimiseOrDi
e
@OptimiseOrDie
68. Rumsfeldian
Space
• What if we changed our prices?
• What if we gave away less for free?
• What if we took this away?
• What about 3 packages, not 5?
• What are these potential futures I can take?
• How can I know before I spend money?
• UPS left hand turning -10 Million Gallons saved
• http://compass.ups.com/UPS-driver-avoid-left-turns/
• McDonalds Hipster Test Store
• bit.ly/1TiURi7
69. #1 Culture & Team
#2 Toolkit & Analytics investment
#3 UX, CX, Service Design, Insight
#4 Persuasive Copywriting
#5 Experimentation (testing) tools
The 5 Legged Optimisation Barstool
And here’s a boring slide about me – and where I’ve been driving over 400M of additional revenue in the last few years. For the sharp eyed amongst you, you’ll see that Lean UX hasn’t been around since 2008. Many startups and teams were doing this stuff before it got a new name, even if the approach was slightly different. For the last 4 years, I’ve been optimising sites using a blend of these techniques.
And here are some of the clients I’ve been working for.
Dull bit is now officially over.
And don’t worry – if it’s not working for you – and looks like this, it’s OK – you’re just doing it the wrong way.
Although I admire AB testing companies - all of them - for championing the right to test and making it easy for anyone to implement - there's a problem. Democratisation of testing brings with it a large chunk of stupidity too.
When YouTube first appeared, did anyone think "Oh boy, there's only ever going to be high quality content to see on here. Seriously. No”
And this crappy AB testing is basically the equivalent of funny cat videos
People taking videos of themselves playing video games
And like, wow, there are 6.9 million Gangnam Style videos. Just incredible.
But hidden in those big numbers, YouTube will always have a tiny percentage of really great stuff, very little good stuff and a long tail of absolute bollocks.
And the same is true of split testing - there's some really well run stuff, getting very good results and there's a lot of air guitar going on.
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
It has taken me a long time to find out where all the bear traps are hidden. Mainly from screwing up tests and figuring out what was wrong, through lots of testing time.
And most companies and teams are stepping on these bear traps without even realising. And they wonder why the test results aren’t replicated in the bank account results. Hah.
I have a list now of about 60 ways to easily break, skew, bias or screw up your tests completely. But here are some real biggies to watch for:
I once explained to my daughter – you know, when adults like look really in control and making decisions and appearing not to suffer from indecision? Don’t believe it for a minute – we’re just better at winging it cause we’re older.
And this is the huge hole that’s gnawing at the hear of many digital operations. The inability to understand what you can and can’t be confident about – but nobody wants to admit they’re guessing a lot of the time.
There is one answer to this trap I call taking a visit to Guessaholics Anonymous - to surrender to the higher power of testing and innovation by using consumer psychological insight and data to guide your hand. To recognise you’re powerless at deciding what’s best or second guessing what will win.
It's actually liberating to not be sitting in a meeting room, arguing about the wording of a bloody button for 4 fucking hours, ever again.
And this was the state of my head in 2004. The inability to understand what you can and can’t be confident about – but nobody wants to admit they’re fucking guessing a lot of the time.
And it took me a long time to figure out I didn’t know anything really – it was all assumptions and cherished notions. It was pretty crushing to test my way to this realisation but MUCH I’m happier now.
Now I think I know this much - but I might know a wee bit more than I think I do – but I’m erring on the side of caution.
That’s because I'm always questioning everything I do through the lens of that consumer insight and testing.
Without customers and data driven insights, you can’t shape revenue and delight. They’ll give you the very psychological insights you need to apply levers to influence them, if you only ask questions. Everything else is just a fucking guess.
Even with tests, if the only inputs you’ve got are ego and opinion, they’re going to be lousy guesses and you’re wasting your experiments.
And now a bit about something I call Rumsfeldian Space – exploring the unknowns. This is vital if you want to make your testing bold enough to get great results.
You need to inhabit the contextual and emotional landscape of the consumer to really shape product or service experience. The only way to do this is have teams and cultures that create a direct and meaningful connection between teams and the customer, in the impact that every change has on the outcome.
Every atom of every piece of copy, design, error message, email, website, support, help content, absolutely bloody everything you do - has to be framed within knowledge and empathy with the consumer fears, worries, barriers, pain but also the real problems we solve by designing products not as features but as life enhancing. And this is the best marketing of all, like the IBM ad.
Business Model Optimisation requires a watchmakers eye – a complete understanding of the watch from macro to micro - the flow of delight and money that can be shaped inside every customer experience, website, and interaction - at a component and a service design level.
Most people have 1 or 2 legs at most. The best companies I've worked with are doing all of these.
Darwin did NOT say 'survival of the fittest' – that was actually another guy called Herbert Spencer. What Darwin actually pushed was that the key ingredients were heritability of traits, variation and selection based on survival. If only your marketing programme was quite as ruthless eh?
And if you want variation and innovation, the survival of good ideas in favour of bad and knowledge that you pass on – you need a culture of adaptability, improvement and change. Agile is about a shared mind-set across managers, leaders and everyone in the team.
There’s a Harvard survey about how the *most* productive teams communicate. Not in meetings but all the time - deskside, IM, phone, skype, GitHub, agile tools, apps - these are the telegraph wires of the collaborative, participative and mission oriented teams.
My key insight of the last 10 years in building and leading teams is that agile, open, flat, cross-silo, participative, flexible and collaborative environments produce customer connected products of high quality. Autoglass NPS higher than Apple.
I hope you enjoyed it as much as I did writing it. All my details are here and slides will be uploaded shortly.
Thank you for your time today.