9. @AlexisKSanders
Analytics is a moving target, keep iterating
getting answers
to questions
etc.automating pulse-
reporting
building interactive
analytics which allow
us to derive insights
progressive development (example)
10. @AlexisKSanders
• Don’t worry about having the perfect report the first time
(accurate insights are a great starting point)
• Incremental improvement in analytics reporting is normal
• Get the data and insights you need first, then automate,
and iterate
• Use least-resistance design mentality!
13. @AlexisKSanders
Separate out metrics
KPI
diagnostic
measure
smoke alarm
KPI:
A business outcome or
measure of success
Diagnostic Measure:
A metric used to identify
which lever(s) will have the
most impact on KPIs
Smoke Alarm:
A metric no one pays attention
to unless it suddenly goes way
up or down
14. @AlexisKSanders
• It measures performance against a goal
• Someone is accountable for performance
• There is context for whether the value is good or bad
A metric is not a KPI unless…
15. @AlexisKSanders
understand what the metrics mean
E.g., what qualifies as a new user?
know what you’re working with
30 day last click?
what metrics/dimensions?
what are top three KPIs?
what data are you capturing?
is site leveraging the data layer?
do you have channel stacking?
Know your analytics package
16. @AlexisKSanders
visual modified from @smrvl via @DannyProl
data information knowledge insight wisdom
metrics segmented
data
dashboarding &
reporting
insights
(data forensics)
strategy
Determine your top data views
17. @AlexisKSanders
• Seasonality
• Another channel’s performance
• What’s going on in paid? social? email?
• External
• Algorithm shifts
• Site changes
• Content added, removed, etc.
• Tracking/Analytics issues misattributions
Evaluating fluctuations in organic
search performance
18. @AlexisKSanders
• Significant site updates (both good and bad)
• Your site updates
• Other digital channels updates
• Company in the news
• Relevant search industry updates
Maintain (a) list(s) of:
19. @AlexisKSanders
Know what’s happening on your site
Visual Ping
Uptime robot
Little Warden
Selenium/Jenkins
Repeat with me:
I am not a robot, I have a life.
I am going to use a robot to do this.
20. @AlexisKSanders
•Know your KPIs
•Know what could cause data fluctuations versus
shifts and how to identify them
•Determine your top data views
•Block off calendar time for building case studies
22. @AlexisKSanders
visualization = important for quick understanding
microcopy = important for broad audiences
clarity = important for ensuring communication
automation = important for our sanity
Concepts important for dashboards
24. @AlexisKSanders
• Integrations with:
• Google Analytics
• Google Search Console
• Google Sheets
• SEMrush (if you’re tracking
keywords)
• Adobe (if you have Adobe, you can
use Workspace)
• Supermetrics
… w/Google Data Studio
30. @AlexisKSanders
• Be lazier
• Automate repeat analyses
• Get a working case study template
• Maintain a record of performance-impacting site events
• Know your data
36. @AlexisKSanders
We notice things
that stand out
(aka, pre-attentive
features)
https://medium.com/design-at-
zoopla/building-purposeful-ui-using-pre-
attentive-attributes-9c5ee5dcc25c; The Big
Book of Dashboards
42. @AlexisKSanders
Be careful with pie
charts, it’s hard to
differentiate small
differences
try it – pick a pie
chart, order it from
most to least
The Big Book of Dashboards;
https://www.businessinsider.com/pie-charts-
are-the-worst-2013-6
43. @AlexisKSanders
The Big Book of Dashboards;
https://www.businessinsider.com/pie-charts-
are-the-worst-2013-6
Be careful with pie
charts, it’s hard to
differentiate small
differences
try it – pick a pie
chart, order it from
most to least
44. @AlexisKSanders
• Play around with different visuals
• Use whichever best answers the question
• Use business question as chart title
• Avoid repetitiveness
• Clarity > “coolness”
• Answer: I choose to _____ because “It looks pretty” or “It communicates
better”
• Checkout, read, skimThe Big Book of Dashboardsby Wexler, Shaffer, Cotgreave
47. @AlexisKSanders
• dataset = too big for excel (and you’re sick of waiting)
• want to connect two disparate datasets
• want visualizations that combine multiple datasets
When to use
54. @AlexisKSanders
• Competitive dashboard
• Paid/organic report
• Keyword analytics
• Ranking factor correlation visualizations
• Forecasting
• Maps (with local data)
• Anything you’d use an R script with
Example use cases
55. @AlexisKSanders
1. Download Power Bi (search [download Power Bi])
2. Log in w/ office email
3. Wil Reynold’s videos
Part I (30 minutes)
Part II (30 minutes)
Get paid conversion data, map against current rankings
4. Download a dataset on Kaggle.com, create visualizations
5. Watch these videos on relationships within Power BI
Understanding Relationships in Power Bi (18 m)
Power Bi Relationship Step-by-Step Example (12 m)
Looking at Many to Many Relationships (8 m)
6. Download organic ranking data (e.g., SEMrush, aHrefs) and a crawl (from Screaming
Frog) for 3x competitors
Answer: does word length correlate with better rankings? (go through each value in
SF)
Answer: what categories are competitors outperforming? (Note: will have to categorize
KW data)
7. Do a mini-challenge on who can create best visual (winner gets small prize)
how to get
yourself and
your team
hooked on
Power Bi
(in ½ day’s
worth of work)
57. @AlexisKSanders
• A-B tests (side note: form of statistical inference)
• Control versus challenger
• Click heatmaps
• Qualitative testing
• Interviews
• Focus groups
• Surveys
• Usability tests
• Screen recordings for basic tasks
• Tree testing / card-sorting for information findability
• Biology-based
• Pulse
• Eye tracking
• Sweat
• Statistical inference
Types of traditional UX/UI site tests
https://www.userinterviews.com/ux-research-
field-guide-chapter/user-research-tools
58. @AlexisKSanders
• A-B tests
• Control versus variant(s)
• Pre-/post-analysis
• Do some campaign/intervention and compare pre- performance
and post
• Casual inference
• Comparing estimated performance (based on pre-period and
possibly control data) versus actual performance
SEO Campaign testing
60. @AlexisKSanders
• We’re making an assumption that the past has something to do with
the future
• Some models require stationarity (meaning that the mean and
variance are constant)
• Creating very high-quality forecasts requires substantial experience*
• There are multiple levers to pull / parameters to adjust
• High potential area for technical debt
• Always going to be some level of uncertainty
Challenges w/ forecasting time series data
*From FB Prophet paper -> https://peerj.com/preprints/3190.pdf
61. @AlexisKSanders
Perception of modeling types… “least” to “most” cool
LSTM
Exponential
smoothing BSTS
(S)ARIMA(X)Linear
Regression
Moving
Average
AutoRegression
There are a lot of forecasting models…
less cool coolest
The coolest doesn’t mean best results.
Additive
Model
RNN
standard level of cool
62. @AlexisKSanders
Takes actuals, uses pre-period to create estimated, shows different post-intervention.
CasualImpact for id’ing (potential) causation
Black line = actual metric
trended
Blue dot = estimated
Light blue = confidence
interval
Actual minus the
estimated performance
Cumulative change
https://bit.ly/causalimpact-seo
https://www.distilled.net/diy-splittester/
https://bit.ly/pshapiro-ci
65. @AlexisKSanders
• Many analysts use FB Prophet: https://facebook.github.io/prophet/
• Forecasting overview python notebook: https://bit.ly/forecasting-
overview
• There are a lot of models for forecasting
• We’ll talk about evaluating (soon young grasshopper)
• Book on time series https://bit.ly/time-series-4 (recommended by
my stat professor colleague, ty Kari!!)
70. @AlexisKSanders
Search [pip install library]
online, use !pip install
[library-name]
1.5. what if it says I don’t have the library!
Find the conda install, use
Anaconda prompt to install
71. @AlexisKSanders
1. Add to same folder as
Jupyter Notebook
2. Use pandas to
read_csv(data)
3. Check data with [dataset-
name].head()
(you can also look at [dataset-name].tail() for last
rows of data or just type the dataset name and it will print
dataset (w/ellipses in middle if too large)
2. pd.read_csv(data)
73. @AlexisKSanders
• Functions (aka, methods) are
used to transform data or
return information
• You’ve probably used len(),
concatenate(), sum() in Excel
• it’s the same idea here, we
provide inputs to function,
function gives us outputs
4. dataset.function(arguments*)
74. @AlexisKSanders
• Start with def function-
name (arguments):
• Add the logical steps (it’s
like a math proof) indented
below (with a tab or four
spaces)
• Buuuut we often use
functions created by other
programmers
5. We can define our own functions
75. @AlexisKSanders
• w/ python use type(object-
name)
• w/ pandas use object-
name.dtypes
6. Python doesn’t require programmer to specify the data
type… so, uh… don’t worry about it* (…until you need to
worry about it…)
*everyone should be cognizant of data types
76. @AlexisKSanders
• dir(object-name) for list
of attributes and methods
• object-name.__doc__ for
documentation on the
object (if it exists)
6. Python doesn’t require programmer to specify the data
type… so, uh… don’t worry about it (…until you need to
worry about it…)
*everyone should be cognizant of data types
77. @AlexisKSanders
• In Pandas they call a table a “data frame”
• “Series” is a basically a column of data
• pandas.dataframe_name.to_csv() to
export df to csv
• pandas.dataframe_name.drop_duplic
ates('column-name') to remove
duplicate values from a column
• Check out
https://www.dataschool.io/python-pandas-
tips-and-tricks/
• Follow @justmarkham
7. Pandas useful stuff
81. @AlexisKSanders
• Data Analysis in Python with Pandas (~ 8 hours, buuuut it’s very fun,
engaging and useful!)
• FB Prophet:
https://facebook.github.io/prophet/docs/quick_start.html#python-api
• SQL Python (~1.5 hours)
• Forecasting with LSTM
• Tips on installing Tensorflow and Keras (that was actually helpful)
Learning practical data science python
86. @AlexisKSanders
Descriptive: Explains data
(e.g., there are 5 red)
Descriptive versus Inference Statistics
population Inference: assumptions
based on “representative”
sample (e.g., 10% chance of
red ball*)
sample
*note: we know it’s actually only 5%...
so there is some margin of error
87. @AlexisKSanders
• Step 1: Create a hypothesis (H1), (aka, alternative)
• H1: campaign sales mean for testing period > 0
• (meaning: the campaign drove an increase in sales)
• Step 1.5: Reverse for null hypothesis (H0)
• H0: campaign sales average ≤ 0
A common form of inference statistics = hypothesis testing
88. @AlexisKSanders
Concept:
You can show that the alternative
hypothesis is true by disproving
the null hypothesis.
So by rejecting the null hypothesis,
the only situation that could
possibly exist is the world in which
the alternative hypothesis is true.
89. @AlexisKSanders
Step 3: Sample the population, step 4: calculate the p-
value
• p-value = chances H0 is valid
• p-value < alpha = statistically significant =
we reject the H0
90. @AlexisKSanders
• Generally: 0.05
• Means we will
reject the null
hypothesis if
there is <5% that
we would find a
sample as
“Extreme” as we
found
Step 5: Decide statistical significance (alpha) criteria
91. @AlexisKSanders
• Effect size = how much the
intervention affects data
• TL;DR: with a large enough
sample, you will often be
able to find statistical
significance, even if the
effect size isn’t big enough
to care
Effect size matters
96. @AlexisKSanders
About partitioning your data to test how accurate your models
are.
Training, validation, and test set
Training Validation Test
use this for training and fitting data
with model.
midterm
test,
practice
rounds
for testing
model
accuracy
(don’t look at
dataset… ever)
97. @AlexisKSanders
Step 2: sum distance
between point and line
(i.e., the error), square
(to remove negatives)
Step 1: how many
points?
We evaluate each by checking error? (e.g., a way = Root
Mean Squared Error)
n=10
This much error (20)2
Step 3: multiply
1/n (i.e., average) * amount
of error
1/10 * (20)2 = 400/10 = 40
Step 4: take root
(inverse of square)
√40 ≈ ~6.325
98. @AlexisKSanders
Buuuuut an analyst would never do that by hand(you’d use something
like sklearn.metric’s mean_squared_error), an easy alternative to use is simply
plotting residuals
Huge thanks to Kari Nelson for providing samples!
99. @AlexisKSanders
• In inference statistics there is always margin of sampling and non-
sampling error
• p-value < alpha (your chosen statistical significance criteria) =
statistical significance
• When modeling data, be aware of under- and overfitting models
• Use validation and test data to assess model accuracy
101. @AlexisKSanders
• It’s not about starting with the perfect, automated
reporting, it’s about iteration
• Simple, accurate analyses are okay
• A case study can always be revisited
• Collect knowledge
• Play around with visualizations, options available in BI
systems, and scripting