In the online world, user engagement refers to the quality of the user experience that emphasizes the phenomena associated with wanting to use a web application longer and frequently. User engagement is a multifaceted, complex phenomenon, giving rise to a number of approaches for its measurement: self-reporting (e.g., questionnaires); observational methods (e.g., facial expression analysis, desktop actions); and web analytics using online behavior metrics. These methods represent various trade-offs between the scale of the data analyzed and the depth of understanding. For instance, surveys are hardly scalable but offer rich, qualitative insights, whereas click data can be collected on a large-scale but are more difficult to analyze. Still, the core research questions each type of measurement is able to answer are unclear. This talk will present various efforts aiming at combining approaches to measure engagement and seeking to provide insights into what questions to ask when measuring engagement.
Keynote at 18th International Conference on Application of Natural Language to Information Systems (NLDB2013), University of Salford, MediaCityUK
Blog: http://labtomarket.wordpress.com
To be or not be engaged: What are the questions (to ask)?
1. To
be
or
not
be
engaged:
What
are
the
ques2ons
(to
ask)?
Mounia
Lalmas
Yahoo!
Labs
Barcelona
mounia@acm.org
1
2. About
me
• Since
January
2011:
Visi2ng
Principal
Scien2st
at
Yahoo!
Labs
Barcelona
• User
engagement,
social
media,
search
• 1999-‐2008:
Lecturer
(assistant
professor)
to
Professor
at
Queen
Mary,
University
of
London
• XML
retrieval
and
evalua>on
(INEX)
• 2008-‐2010:
MicrosoR
Research/RAEng
Research
Professor
at
the
University
of
Glasgow
• Quantum
theory
to
model
informa>on
retrieval
Blog:
labtomarket.wordpress.com
2
3. Why
is
it
important
to
engage
users?
• In
today’s
wired
world,
users
have
enhanced
expecta>ons
about
their
interac>ons
with
technology
…
resul>ng
in
increased
compe>>on
amongst
the
purveyors
and
designers
of
interac>ve
systems.
• In
addi>on
to
u>litarian
factors,
such
as
usability,
we
must
consider
the
hedonic
and
experien>al
factors
of
interac>ng
with
technology,
such
as
fun,
fulfillment,
play,
and
user
engagement.
• In
order
to
make
engaging
systems,
we
need
to
understand
what
user
engagement
is
and
how
to
measure
it.
3
4. Why
is
it
important
to
measure
and
interpret
user
engagement
well?
CTR
4
5. Outline
• What
is
user
engagement?
• What
are
the
characteris>cs
of
user
engagement?
• How
to
measure
user
engagement?
• What
are
the
ques>ons
to
ask?
saliency,
interes>ng,
serendipity,
relevance,
sen>ment,
reading,
news,
social
media,
user
generated
content,
automa>c
linking,
aesthe>cs.
5
8. What
is
user
engagement?
User
engagement
is
a
quality
of
the
user
experience
that
emphasizes
the
posi>ve
aspects
of
interac>on
–
in
par>cular
the
fact
of
being
cap>vated
by
the
technology
(Ahield
et
al,
2011).
user
feelings:
happy,
sad,
excited,
…
emo>onal,
cogni>ve
and
behavioural
connec>on
that
exists,
at
any
point
in
>me
and
over
>me,
between
a
user
and
a
technological
resource
user
interac2ons:
click,
read,
comment,
buy…
user
mental
states:
involved,
lost,
concentrated…
8
9. Considera2ons
in
the
measurement
of
user
engagement
• Short
term
(within
session)
and
long
term
(across
mul>ple
sessions)
• Laboratory
vs.
field
studies
• Subjec>ve
vs.
objec>ve
measurement
• Large
scale
(e.g.,
dwell
>me
of
100,000
people)
vs.
small
scale
(gaze
pa[erns
of
10
people)
• User
engagement
as
process
vs.
as
product
One
is
not
be[er
than
other;
it
depends
on
what
is
the
aim.
9
11. Characteris2cs
of
user
engagement
(I)
• Users
must
be
focused
to
be
engaged
• Distor>ons
in
the
subjec>ve
percep>on
of
>me
used
to
measure
it
Focused
a_en2on
(Webster
&
Ho,
1997;
O’Brien,
2008)
• Emo>ons
experienced
by
user
are
intrinsically
mo>va>ng
• Ini>al
affec>ve
“hook”
can
induce
a
desire
for
explora>on,
ac>ve
discovery
or
par>cipa>on
Posi2ve
Affect
(O’Brien
&
Toms,
2008)
• Sensory,
visual
appeal
of
interface
s>mulates
user
&
promotes
focused
a[en>on
• Linked
to
design
principles
(e.g.
symmetry,
balance,
saliency)
Aesthe2cs
(Jacques
et
al,
1995;
O’Brien,
2008)
• People
remember
enjoyable,
useful,
engaging
experiences
and
want
to
repeat
them
• Reflected
in
e.g.
the
propensity
of
users
to
recommend
an
experience/a
site/a
product
Endurability
(Read,
MacFarlane,
&
Casey,
2002;
O’Brien,
2008)
11
12. Characteris2cs
of
user
engagement
(II)
• Novelty,
surprise,
unfamiliarity
and
the
unexpected
• Appeal
to
users’
curiosity;
encourages
inquisi>ve
behavior
and
promotes
repeated
engagement
Novelty
(Webster
&
Ho,
1997;
O’Brien,
2008)
• Richness
captures
the
growth
poten>al
of
an
ac>vity
• Control
captures
the
extent
to
which
a
person
is
able
to
achieve
this
growth
poten>al
Richness
and
control
(Jacques
et
al,
1995;
Webster
&
Ho,
1997)
• Trust
is
a
necessary
condi>on
for
user
engagement
• Implicit
contract
among
people
and
en>>es
which
is
more
than
technological
Reputa2on,
trust
and
expecta2on
(Attfield et al,
2011)
• Difficul>es
in
sevng
up
“laboratory”
style
experiments
• Why
should
users
engage?
Mo2va2on,
interests,
incen2ves,
and
benefits
(Jacques
et
al.,
1995;
O’Brien
&
Toms,
2008)
12
14. Measuring
user
engagement
Measures
Characteris2cs
Self-‐reported
engagement
Ques>onnaire,
interview,
report,
product
reac>on
cards,
think-‐aloud
Subjec>ve
Short-‐
and
long-‐term
Lab
and
field
Small-‐scale
Product
outcome
Cogni>ve
engagement
Task-‐based
methods
(>me
spent,
follow-‐on
task)
Physiological
measures
(e.g.
EEG,
SCL,
fMRI,
eye
tracking,
mouse-‐tracking)
Objec>ve
Short-‐term
Lab
and
field
Small-‐scale
and
large-‐scale
Process
outcome
Interac>on
engagement
Web
analy>cs
metrics
+
models
Objec>ve
Short-‐
and
long-‐term
Field
Large-‐scale
Process
outcome
14
15. Large-‐scale
measurements
of
user
engagement
–
Web
analy2cs
Intra-‐session
measures
Inter-‐session
measures
• Dwell
>me
/
session
dura>on
• Play
>me
(video)
• (Mouse
movement)
• Click
through
rate
(CTR)
• Mouse
movement
• Number
of
pages
viewed
(click
depth)
• Conversion
rate
(mostly
for
e-‐
commerce)
• Number
of
UCG
(comments)
• Frac>on
of
return
visits
• Time
between
visits
(inter-‐session
>me,
absence
>me)
• Total
view
>me
per
month
(video)
• Life>me
value
(number
of
ac>ons)
• Number
of
sessions
per
unit
of
>me
• Total
usage
>me
per
unit
of
>me
• Number
of
friends
on
site
(social
networks)
• Number
of
UCG
(comments)
• Intra-‐session
engagement
measures
our
success
in
a[rac>ng
the
user
to
remain
on
our
site
for
as
long
as
possible.
• Inter-‐session
engagement
can
be
measured
directly
or,
for
commercial
sites,
by
observing
life>me
customer
value.
15
17. Signals
–
Signals
–
Signals:
Five
studies
self-‐reported
engagement
WHAT
ARE
THE
QUESTIONS
TO
ASK?
Interac>on
engagement
17
18. STUDY
I
• Domain:
entertainment
news
• Study:
saliency
• Measurement:
focus
a[en>on
and
affect
18
+
Lori
McCay-‐Peet
+
Vidhya
Navalpakkam
19. • How
the
visual
catchiness
(saliency)
of
“relevant”
informa>on
impacts
user
engagement
metrics
such
as
focused
a[en>on
and
emo>on
(affect)
• focused
a_en2on
refers
to
the
exclusion
of
other
things
• affect
relates
to
the
emo>ons
experienced
during
the
interac>on
• Saliency
model
of
visual
a[en>on
developed
by
(Iv
&
Koch,
2000)
Self-‐report
engagement
19
20. Manipula2ng
saliency
Web
page
screenshot
Saliency
maps
salient
condi>on
non-‐salient
condi>on
(McCay-‐Peet
et
al,
2012)
20
21. Study
design
• 8
tasks
=
finding
latest
news
or
headline
on
celebrity
or
entertainment
topic
• Affect
measured
pre-‐
and
post-‐
task
using
the
Posi>ve
e.g.
“determined”,
“a[en>ve” and
Nega>ve
e.g.
“hos>le”,
“afraid”
Affect
Schedule
(PANAS)
• Focused
a[en>on
measured
with
7-‐item
focused
a4en5on
subscale
e.g.
“I
was
so
involved
in
my
news
tasks
that
I
lost
track
of
>me”,
“I
blocked
things
out
around
me
when
I
was
comple>ng
the
news
tasks”
and
perceived
>me
• Interest
level
in
topics
(pre-‐task)
and
ques>onnaire
(post-‐task)
e.g.
“I
was
interested
in
the
content
of
the
web
pages”,
“I
wanted
to
find
out
more
about
the
topics
that
I
encountered
on
the
web
pages”
• 189
(90+99)
par>cipants
from
Amazon
Mechanical
Turk
21
22. PANAS
(10
posi2ve
items
and
10
nega2ve
items)
• You
feel
this
way
right
now,
that
is,
at
the
present
moment
[1
=
very
slightly
or
not
at
all;
2
=
a
li[le;
3
=
moderately;
4
=
quite
a
bit;
5
=
extremely]
[randomize
items]
distressed,
upset,
guilty,
scared,
hos>le,
irritable,
ashamed,
nervous,
ji[ery,
afraid
interested,
excited,
strong,
enthusias>c,
proud,
alert,
inspired,
determined,
a[en>ve,
ac>ve
(Watson,
Clark
&
Tellegen,
1988)
22
23. 7-‐item
focused
a_en2on
subscale
(part
of
the
31-‐item
user
engagement
scale)
5-‐point
scale
(strong
disagree
to
strong
agree)
1. I
lost
myself
in
this
news
tasks
experience
2. I
was
so
involved
in
my
news
tasks
that
I
lost
track
of
>me
3. I
blocked
things
out
around
me
when
I
was
comple>ng
the
news
tasks
4. When
I
was
performing
these
news
tasks,
I
lost
track
of
the
world
around
me
5. The
>me
I
spent
performing
these
news
tasks
just
slipped
away
6. I
was
absorbed
in
my
news
tasks
7. During
the
news
tasks
experience
I
let
myself
go
(O'Brien
&
Toms,
2010)
23
24. Saliency
and
posi2ve
affect
• When
headlines
are
visually
non-‐salient
•
users
are
slow
at
finding
them,
report
more
distrac>on
due
to
web
page
features,
and
show
a
drop
in
affect
• When
headlines
are
visually
catchy
or
salient
•
user
find
them
faster,
report
that
it
is
easy
to
focus,
and
maintain
posi>ve
affect
• Saliency
is
helpful
in
task
performance,
focusing/
avoiding
distrac2on
and
in
maintaining
posi2ve
affect
24
25. Saliency
and
focused
a_en2on
• Adapted
focused
a[en>on
subscale
from
the
online
shopping
domain
to
entertainment
news
domain
• Users
reported
“easier
to
focus
in
the
salient
condi>on”
BUT
no
significant
improvement
in
the
focused
a[en>on
subscale
or
differences
in
perceived
>me
spent
on
tasks
• User
interest
in
web
page
content
is
a
good
predictor
of
focused
a_en2on,
which
in
turn
is
a
good
predictor
of
posi2ve
affect
25
26. Self-‐repor2ng,
crowdsourcing,
saliency
and
user
engagement
• Interac>on
of
saliency,
focused
a[en>on,
and
affect,
together
with
user
interest,
is
complex.
• Using
crowdsourcing
worked!
• What
next?
• include
web
page
content
as
a
quality
of
user
engagement
in
focused
a[en>on
scale
• more
“realis2c”
user
(interac>ve)
reading
experience
• other
measurements:
mouse-‐tracking,
eye-‐tracking,
facial
expression
analysis,
etc.
(McCay-‐Peet,
Lalmas
&
Navalpakkam,
2012)
26
27. STUDY
II
• Domain:
news
and
user
generated
content
(comments)
• Study:
interes>ngness
and
sen>ment
• Measurement:
focus
a[en>on,
affect
and
gaze
27
+
Ioannis
Arapakis
+
Barla
Cambazoglu
+
Mari-‐Carmen
Marcos
+
Joemon
Jose
28. Gaze
and
self-‐repor2ng
• News
+
comments
• Sen>ment,
interest
• 57
users
(lab-‐based)
• Reading
task
(114)
• Ques>onnaire
(qualita>ve
data)
• Record
eye
tracking
(quan>ta>ve
data)
Three
metrics:
gaze,
focus
a[en>on
and
posi>ve
affect
28
(Lin
et
al,
2007)
29. Interes2ng
content
promote
users
engagement
metrics
• All
three
metrics:
• focus
a[en>on,
posi>ve
affect
&
gaze
• What
is
the
right
trade-‐off?
• news
is
news
J
• Can
we
predict?
• provider,
editor,
writer,
category,
genre,
visual
aids,
…,
sen2mentality,
…
• Role
of
user-‐generated
content
(comments)
• As
measure
of
engagement?
• To
promote
engagement?
29
30. Lots
of
sen2ments
but
with
nega2ve
connota2ons!
• Positive effect (and interest, enjoyment and wanted to
know more) correlates
• Positively (é) with sentimentality (lots of
emotions)
• Negatively (ê) with positive polarity (happy news)
Sen2Strenght
(from
-‐5
to
5
per
word)
sen>mentality:
sum
of
absolute
values
(amount
of
sen>ments)
polairity:
sum
of
values
(direc>on
of
the
sen>ments:
posi>ve
vs
nega>ve)
(Thelwall,
Buckley
&
Paltoglou,
2012)
30
31. Effect
of
comments
on
user
engagement
• 6
ranking
of
comments:
• most
replied,
most
popular,
newest
• sen>mentality
high,
sen>mentality
low
• polarity
plus,
polarity
minus
• Longer
gaze
on
• newest
and
most
popular
for
interes>ng
news
• most
replied
and
high
sen>mentality
for
non-‐interes>ng
news
• Can
we
leverage
this
to
prolong
user
a[en>on?
31
32. Gaze,
sen2mentality,
interest
• Interes>ng
and
“a[rac>ve”
content!
• Sen>ment
as
a
proxy
of
focus
a[en>on,
posi>ve
affect
and
gaze?
• Next
• Larger-‐scale
study
• Other
domains
(beyond
news!)
• Role
of
social
signals
(e.g.
Facebook,
Twi[er)
• Lots
more
data:
mouse
tracking,
EEG,
facial
expression
(Arapakis
et
al.,
2013)
32
33. STUDY
III
• Domain:
news
and
social
media
(Wikipedia)
• Study:
interes>ngness,
aesthe>cs,
task
• Measurement:
focus
a[en>on,
affect
and
mouse
movement
33
+
David
Warnock
34. Mouse
tracking
and
self-‐repor2ng
• 324
users
from
Amazon
Mechanical
Turk
(between
subject
design)
• Two
domains
(BBC
News
and
Wikipedia)
• Two
tasks
(reading
and
search)
• “Normal
vs
Ugly” interface
• Ques>onnaires
(qualita>ve
data)
• focus
a[en>on,
posi>ve
effect,
novelty,
• interest,
usability,
aesthe>cs
• +
demographics,
handeness
&
hardware
• Mouse
tracking
(quan>ta>ve
data)
• movement
speed,
movement
rate,
click
rate,
pause
length,
percentage
of
>me
s>ll
34
36. Mouse
tracking
can
tell
about
• Age
• Hardware
• Mouse
• Trackpad
• Task
• Searching:
There
are
many
different
types
of
phobia.
What
is
Gephyrophobia
a
fear
of?
• Reading:
(Wikipedia)
Archimedes,
Sec5on
1:
Biography
36
37. Mouse
tracking
could
not
tell
much
about
• focused
a[en>on
and
posi>ve
affect
• user
interests
in
the
task/topic
• BUT
BUT
BUT
BUT
• “ugly”
variant
did
not
result
in
lower
aesthe>cs
scores
•
although
BBC
>
Wikipedia
• BUT
–
the
comments
le•
…
• Wikipedia:
“The
website
was
simply
awful.
Ads
flashing
everywhere,
poor
text
colors
on
a
dark
blue
background.”;
“The
webpage
was
en5rely
blue.
I
don't
know
if
it
was
supposed
to
be
like
that,
but
it
definitely
detracted
from
the
browsing
experience.”
• BBC
News:
“The
website's
layout
and
color
scheme
were
a
bitch
to
navigate
and
read.”;
“Comic
sans
is
a
horrible
font.”
37
38. Mouse
tracking
and
user
engagement
• Task
and
hardware
• Do
we
have
a
Hawthorne
Effect???
• “Usability” vs
engagement
• “Even
uglier”
interface?
• Within-‐
vs
between-‐subject
design?
• What
next?
• Sequence
of
movements
• Automa>c
clustering
(Warnock
&
Lalmas,
2013)
38
39. STUDY
IV
• Domain:
news
• Study:
automa>c
linking
• Measurement:
interes>ngness
39
+
Ioannis
Arapakis
+Hakan
Ceylan
+
Pinar
Domnez
41. LEPA: Linker for Events to Past Articles
LEPA is a a fully automated approach to
constructing hyperlinks in news articles
using “simple” text processing and
understanding techniques
Indexer
• Processes
ar>cles
over
a
>me
period
by
extrac>ng
features
from
each
ar>cle
and
storing
them
to
facilitate
faster
retrieval
Linker
• Iden>fies
sentences
that
contain
newsworthy
events
• For
each
such
event
it
retrieves
from
the
index
all
the
matching
ar>cles
and
links
the
top-‐ranked
with
the
event
41
43. Pilot study
• Rating results:
• Bad: 35.15%
• Fair: 33.93%
• Good: 20%
• Excellent: 9.09%
• Not Judged: 1.81%
• With 63.03% of the links being good:
• initial evidence that LEPA is not too far from the
optimum achieved by human editors
Professional
editors
A
collec>on
of
system-‐embedded
links
(164
ar>cle-‐link
combina>ons)
5-‐point
Likert
Scale:
(i)
bad,
(ii)
fair,
(iii)
good,
(iv)
excellent,
and
(v)
not
judged
43
44. Assessing
the
links:
are
they
related?
• 664
par>cipants
recruited
through
Amazon
Mechanical
Turk;
between-‐group
design
(two
groups)
• Precision
=
frac>on
of
links
(total=164)
that
received,
in
terms
of
relatedness,
a
score
equal
to,
or
greater
than,
3
on
a
5-‐point
Likert
scale
System-Embedded Links Manually-curated Links
Participant
A
Participant
B
All Participant
A
Participant
B
All
Related to the main
theme
49% 42% 45% 54% 51% 53%
Related to subtopic 21% 24% 22% 31% 34% 33%
Tangentially related 13% 15% 14% 9% 12% 10%
Unrelated 15% 16% 16% 5% 1% 3%
Other 2% 2% 2% 1% 2% 1%
44
45. Assessing
the
Reading
Experience
• 120
par>cipants
recruited
through
Amazom
Mechanical
Turk;
between-‐groups
design
(three
groups)
• Editors
+
two
opposite
“extremes”
of
LEPA:
• High
recall:
best
at
embedding
newsworthy
links
&
ar>cles
that
provide
interes>ng
insights
• High
precision:
best
in
terms
of
embedding
the
right
number
of
links
good
topical
coverage
informa2ve
ness
broader
perspec2ve
interes2ng
insights
good
topical
coverage
link
presenta2on
content
volume
posi2ve
news
reading
experience
45
induc>ve,
thema>c
coding
of
open-‐ended
ques>ons
46. Automa2c
linking
and
news
reading
experience
• Even
under
realis>c
and
uncontrolled
condi>ons,
performance
of
LEPA
comparable
to
that
of
editors,
and
in
some
cases
be[er
• High
precision
vs.
high
recall
• High
precision
threshold
leads
to
a
be[er
news
reading
experience:
less
is
more
“They
were
too
many,
being
mostly
quite
long,
in
some
cases
more
than
half
the
length
of
the
main
ar5cle,
and
some5mes
they
repeated
the
same
iden5cal
informa5on”
46
47. STUDY
V
• Domain:
social
media
(Yahoo!
Answers
and
Wikipedia)
• Study:
serendipity
• Measurement:
relevance,
unexpectedness,
interes>ngness
47
+
Ilaria
Bordino
+
Yelena
Mejova
48. En2ty-‐driven
Exploratory
Search
Linguis-cally
Mo-vated
Seman-c
Aggrega-on
Engines
“transi5on
to
a
truly
seman5c
aggrega5on
paradigm
where
machines
understand
a
user’s
intent,
discover
and
organize
facts,
iden5fy
opinions,
experiences
and
trends”
En>ty
Search
we
build
an
en>ty-‐driven
serendipitous
search
system
based
on
en>ty
networks
extracted
from
Wikipedia
and
Yahoo!
Answers
Serendipity
finding
something
good
or
useful
while
not
specifically
looking
for
it,
serendipitous
search
systems
provide
relevant
and
interes>ng
results
48
49. Yahoo!
Answers
vs
Wikipedia
community-‐driven
ques>on
&
answer
portal
• 67
336
144
ques>ons
&
261
770
047
answers
• January
1,
2010
–
December
31,
2011
• English-‐language
community-‐driven
encyclopedia
• 3
795
865
ar>cles
• as
of
end
of
December
2011
• English
Wikipedia
curated
high-‐quality
knowledge
variety
of
niche
topics
minimally
curated
opinions,
gossip,
personal
info
variety
of
points
of
view
49
50. Entity
&
Relationship
Extraction
• en2ty
–
any
well-‐defined
concept
that
has
a
Wikipedia
page
• rela2onship
–
a
topical
rela>onship/similarity
between
a
pair
of
en>>es
based
on
document
co-‐occurrence
• related
to
the
number
of
documents
in
which
the
two
en>>es
occur
50
Dataset
#
Nodes
#
Edges
Density
#
Isolated
Yahoo!
Answers
896,799
112,595,138
0.00028
69,856
Wikipedia
1,754,069
237,058,218
0.00015
82,381
Dataset
Avg
Degree
Max
Degree
Size
of
Largest
CC
Yahoo!
Answers
251
231,921
826,402
(92.15%)
Wikipedia
270
346,070
1,671,241
(95.28%)
52. Retrieval
Wikipedia
Yahoo!
Answers
Combined
Precision
@
5
0.668
0.724
0.744
MAP
0.716
0.762
0.782
Jus>n
Bieber,
Nicki
Minaj,
Katy
Perry,
Shakira,
Eminem,
Lady
Gaga,
Jose
Mourinho,
Selena
Gomez,
Kim
Kardashian,
Miley
Cyrus,
Robert
Pavnson,
Adele
%28singer%29,
Steve
Jobs,
Osama
bin
Laden,
Ron
Paul,
Twi[er,
Facebook,
Ne_lix,
IPad,
IPhone,
Touchpad,
Kindle,
Olympic
Games,
Cricket,
FIFA,
Tennis,
Mount
Everest,
Eiffel
Tower,
Oxford
Street,
Nubcrburgring,
Hai>,
Chile,
Libya,
Egypt,
Middle
East,
Earthquake,
Oil
spill,
Tsunami,
Subprime
mortgage
crisis,
Bailout,
Terrorism,
Asperger
syndrome,
McDonal's,
Vitamin
D,
Appendici>s,
Cholera,
Influenza,
Pertussis,
Vaccine,
Childbirth
3
labels
per
query-‐result
pair
gold
standard
quality
control
Yahoo!
Answers
Jon
Rubinstein
Timothy
Cook
Kane
Kramer
Steve
Wozniak
Jerry
York
Wikipedia
System
7
PowerPC
G4
SuperDrive
Power
Macintosh
Power
Compu>ng
Corp.
Steve
Jobs
• Annotator
agreement
(overlap):
0.85
• Average
overlap
in
top
5
results:
<1
52
retrieve
en>>es
most
related
to
a
query
en>ty
using
random
walk
53. |
relevant
&
unexpected
|
/
|
unexpected
|
number
of
serendipitous
results
out
of
all
of
the
unexpected
results
retrieved
|
relevant
&
unexpected
|
/
|
retrieved
|
serendipitous
out
of
all
retrieved
53
Baseline
Data
Top:
5
en>>es
that
occur
most
frequently
WP
0.63
(0.58)
in
top
5
search
from
Bing
and
Google
YA
0.69
(0.63)
Top
–WP:
same
as
above,
but
excluding
WP
0.63
(0.58)
Wikipedia
page
from
results
YA
0.70
(0.64)
Rel:
top
5
en>>es
in
the
related
query
WP
0.64
(0.61)
sugges>ons
provided
by
Bing
and
Google
YA
0.70
(0.65)
Rel
+
Top:
union
of
Top
and
Rel
WP
0.61
(0.54)
YA
0.68
(0.57)
Serendipity
“making
fortunate
discoveries
by
accident”
Serendipity
=
unexpectedness
+
relevance
“Expected”
result
baselines
from
web
search
54. Interes2ngness
≠
Relevance
Interes2ng
>
Relevant
Relevant
>
Interes2ng
Oil
Spill
à
Penguins
in
Sweaters
WP
Robert
Pavnson
à
Water
for
Elephants
WP
Lady
Gaga
à
Britney
Spears
WP
Egypt
à
Cairo
Conference
WP
Ne_lix
à
Blu-‐ray
Disc
YA
Egypt
à
Ptolemaic
Kingdom
WP
&
YA
54
(Bordino,
Mejova
&
Lalmas,
2013)
55. Similarity
(Kendall’s
tau-‐b)
between
result
sets
and
reference
ranking
55
Data
tau-‐b
Which
result
is
more
WP
0.162
relevant
to
the
query?
YA
0.336
If
someone
is
interested
in
the
query,
would
WP
0.162
they
also
be
interested
in
the
result?
YA
0.312
Even
if
you
are
not
interested
in
the
query,
WP
0.139
is
the
result
interes5ng
to
you
personally?
YA
0.324
Would
you
learn
anything
new
about
WP
0.167
the
query
from
the
results
YA
0.307
Following
(Arguello
et
al,
2011)
1. Labelers
provide
pairwise
comparisons
between
results
2. Combine
into
a
reference
ranking
3. Compare
result
ranking
to
op>mal
ranking
using
Kendall’s
tau
Assessing
“interes2ngness”
57. What
are
the
ques2ons
to
ask?
• No
one
measurement
is
perfect
or
complete.
• All
studies
(process
or
product)
have
different
constraints.
• Need
to
ensure
methods
are
applied
consistently
with
a[en>on
to
reliability:
what
is
a
good
signal?
• More
emphasis
should
be
placed
on
using
mixed
methods
to
improve
the
validity
of
the
measures.
• Be
careful
of
the
WEIRD
syndrome
(Western,
Educated,
Industrialized,
Rich,
and
Democra>c)
57
58. Acknowledgements
• Collaborators:
Ioannis
Arapakis,
Ilaria
Bordino,
,
Barla
Cambazoglu,
Hakan
Ceylan,
Pinar
Domnez,
Lori
McCay-‐
Peet,
Yelena
Mejova,
Vidhya
Navalpakkam,
David
Warnock,
and
others
at
Yahoo!
Labs.
• This
talk
uses
some
material
from
a
tutorial
“Measuring
User
Engagement”
given
at
WWW
2013,
Rio
de
Janeiro
(with
Heather
O’Brien
and
Elad
Yom-‐Tov)
Blog:
labtomarket.wordpress.com
58