To be or not be engaged: What are the questions (to ask)?

To
be
or
not
be
engaged:

What
are
the
ques2ons
(to
ask)?

Mounia
Lalmas

Yahoo!
Labs
Barcelona

mounia@acm.org

1

About
me

•  Since
January
2011:
Visi2ng
Principal
Scien2st
at
Yahoo!

Labs
Barcelona

•  User
engagement,
social
media,
search

•  1999-‐2008:
Lecturer
(assistant
professor)
to
Professor
at

Queen
Mary,
University
of
London

•  XML
retrieval
and
evalua>on
(INEX)

•  2008-‐2010:
MicrosoR
Research/RAEng
Research
Professor

at
the
University
of
Glasgow

•  Quantum
theory
to
model
informa>on
retrieval

Blog:
labtomarket.wordpress.com

2

Why
is
it
important
to
engage
users?

•  In
today’s
wired
world,
users
have
enhanced

expecta>ons
about
their
interac>ons
with
technology

…
resul>ng
in
increased
compe>>on
amongst
the

purveyors
and
designers
of
interac>ve
systems.

•  In
addi>on
to
u>litarian
factors,
such
as
usability,
we

must
consider
the
hedonic
and
experien>al
factors
of

interac>ng
with
technology,
such
as
fun,
fulﬁllment,
play,

and
user
engagement.

•  In
order
to
make
engaging
systems,
we
need
to

understand
what
user
engagement
is
and
how
to

measure
it.

3

Why
is
it
important
to
measure
and
interpret
user

engagement
well?

CTR

4

Outline

•  What
is
user
engagement?

•  What
are
the
characteris>cs
of
user
engagement?

•  How
to
measure
user
engagement?

•  What
are
the
ques>ons
to
ask?

saliency,
interes>ng,
serendipity,
relevance,

sen>ment,
reading,
news,
social
media,

user
generated
content,
automa>c
linking,

aesthe>cs.

5

WHAT
IS
USER

ENGAGEMENT?

6

h[p://thenextweb.com/asia/2013/05/03/kakao-‐talk-‐rolls-‐out-‐plus-‐friend-‐home-‐a-‐
revamped-‐pla_orm-‐to-‐connect-‐users-‐with-‐their-‐favorite-‐brands/

Engagement
is
on
everyone’s
mind

h[p://socialbarrel.com/70-‐percent-‐of-‐brand-‐engagement-‐on-‐pinterest-‐come-‐from-‐users/51032/

h[p://iac>onable.com/user-‐engagement/

h[p://www.cio.com.au/ar>cle/459294/
heart_founda>on_uses_gamiﬁca>on_drive_user_engagement/

h[p://www.localgov.co.uk/index.cfm?method=news.detail&id=109512

h[p://www.treﬁs.com/stock/lnkd/ar>cles/179410/linkedin-‐makes-‐a-‐90-‐
million-‐bet-‐on-‐pulse-‐to-‐help-‐drive-‐user-‐engagement/2013-‐04-‐15
7

What
is
user
engagement?

User
engagement
is
a
quality
of
the
user
experience
that

emphasizes
the
posi>ve
aspects
of
interac>on
–
in

par>cular
the
fact
of
being
cap>vated
by
the
technology

(Ahield
et
al,
2011).

user
feelings:
happy,
sad,

excited,
…

emo>onal,
cogni>ve
and
behavioural
connec>on

that
exists,
at
any
point
in
>me
and
over
>me,
between

a
user
and
a
technological
resource

user
interac2ons:
click,

read,
comment,
buy…

user
mental
states:
involved,

lost,
concentrated…

8

Considera2ons
in
the
measurement
of

user
engagement

•  Short
term
(within
session)
and
long
term
(across

mul>ple
sessions)

•  Laboratory
vs.
ﬁeld
studies

•  Subjec>ve
vs.
objec>ve
measurement

•  Large
scale
(e.g.,
dwell
>me
of
100,000
people)

vs.
small
scale
(gaze
pa[erns
of
10
people)

•  User
engagement
as
process
vs.
as
product

One
is
not
be[er
than
other;
it
depends
on
what
is
the
aim.

9

CHARACTERISTICS

OF
USER

ENGAGEMENT

10

Characteris2cs
of
user
engagement
(I)

• Users
must
be
focused
to
be
engaged

• Distor>ons
in
the
subjec>ve
percep>on
of
>me
used
to

measure
it

Focused
a_en2on

(Webster
&
Ho,
1997;
O’Brien,

2008)

• Emo>ons
experienced
by
user
are
intrinsically
mo>va>ng

• Ini>al
affec>ve
“hook”
can
induce
a
desire
for
explora>on,

ac>ve
discovery
or
par>cipa>on

Posi2ve
Affect

(O’Brien
&
Toms,
2008)

• Sensory,
visual
appeal
of
interface
s>mulates
user
&

promotes
focused
a[en>on

• Linked
to
design
principles
(e.g.
symmetry,
balance,
saliency)

Aesthe2cs

(Jacques
et
al,
1995;
O’Brien,
2008)

• People
remember
enjoyable,
useful,
engaging
experiences

and
want
to
repeat
them

• Reflected
in
e.g.
the
propensity
of
users
to
recommend
an

experience/a
site/a
product

Endurability

(Read,
MacFarlane,
&
Casey,
2002;

O’Brien,
2008)

11

Characteris2cs
of
user
engagement
(II)

•  Novelty,
surprise,
unfamiliarity
and
the
unexpected

•  Appeal
to
users’
curiosity;
encourages
inquisi>ve

behavior
and
promotes
repeated
engagement

Novelty

(Webster
&
Ho,
1997;
O’Brien,

2008)

•  Richness
captures
the
growth
poten>al
of
an
ac>vity

•  Control
captures
the
extent
to
which
a
person
is
able

to
achieve
this
growth
poten>al

Richness
and
control

(Jacques
et
al,
1995;
Webster
&

Ho,
1997)

•  Trust
is
a
necessary
condi>on
for
user
engagement

•  Implicit
contract
among
people
and
en>>es
which
is

more
than
technological

Reputa2on,
trust
and

expecta2on
(Attfield et al,
2011)

•  Diﬃcul>es
in
sevng
up
“laboratory”
style

experiments

•  Why
should
users
engage?

Mo2va2on,
interests,

incen2ves,
and
beneﬁts

(Jacques
et
al.,
1995;
O’Brien
&

Toms,
2008)

12

MEASURING
USER

ENGAGEMENT

13

Measuring
user
engagement

Measures
Characteris2cs

Self-‐reported

engagement

Ques>onnaire,
interview,
report,

product
reac>on
cards,
think-‐aloud

Subjec>ve

Short-‐
and
long-‐term

Lab

and
ﬁeld

Small-‐scale

Product
outcome

Cogni>ve

engagement

Task-‐based
methods
(>me
spent,

follow-‐on
task)

Physiological
measures
(e.g.
EEG,
SCL,

fMRI,
eye
tracking,
mouse-‐tracking)

Objec>ve

Short-‐term

Lab
and
ﬁeld

Small-‐scale
and
large-‐scale

Process
outcome

Interac>on

engagement

Web
analy>cs

metrics
+
models

Objec>ve

Short-‐
and
long-‐term

Field

Large-‐scale

Process
outcome

14

Large-‐scale
measurements
of
user

engagement
–
Web
analy2cs

Intra-‐session
measures
Inter-‐session
measures

•  Dwell
>me
/
session
dura>on

•  Play
>me
(video)

•  (Mouse
movement)

•  Click
through
rate
(CTR)

•  Mouse
movement

•  Number
of
pages
viewed
(click

depth)

•  Conversion
rate
(mostly
for
e-‐
commerce)

•  Number
of
UCG
(comments)

•  Frac>on
of
return
visits

•  Time
between
visits
(inter-‐session

>me,
absence
>me)

•  Total
view
>me
per
month
(video)

•  Life>me
value
(number
of
ac>ons)

•  Number
of
sessions
per
unit
of
>me

•  Total
usage
>me
per
unit
of
>me

•  Number
of
friends
on
site
(social

networks)

•  Number
of
UCG
(comments)

•  Intra-‐session
engagement
measures
our
success
in
a[rac>ng
the

user
to
remain
on
our
site
for
as
long
as
possible.

•  Inter-‐session
engagement
can
be
measured
directly
or,
for

commercial
sites,
by
observing
life>me
customer
value.

15

Cogni2ve
engagement

•  Eye
tracking

•  Mouse
movement

•  Face
expression

•  Psychophysiological
measures

Respira>on,
Pulse
rate

Temperature,
Brain
wave,

Skin
conductance,
…

16

Signals
–
Signals
–
Signals:
Five
studies

self-‐reported
engagement

WHAT
ARE
THE

QUESTIONS
TO
ASK?

Interac>on
engagement

17

STUDY
I

•  Domain:
entertainment
news

•  Study:
saliency

•  Measurement:
focus
a[en>on
and
aﬀect

18
+
Lori
McCay-‐Peet
+
Vidhya
Navalpakkam

•  How
the
visual
catchiness
(saliency)
of
“relevant”

informa>on
impacts
user
engagement
metrics
such

as
focused
a[en>on
and
emo>on
(aﬀect)

•  focused
a_en2on
refers
to
the
exclusion
of
other

things

•  aﬀect
relates
to
the
emo>ons
experienced
during

the
interac>on

•  Saliency
model
of
visual
a[en>on
developed

by
(Iv
&
Koch,
2000)

Self-‐report
engagement

19

Manipula2ng
saliency

Web
page
screenshot

Saliency
maps

salient
condi>on
non-‐salient
condi>on

(McCay-‐Peet
et
al,
2012)

20

Study
design

•  8
tasks
=
finding
latest
news
or
headline
on
celebrity
or

entertainment
topic

•  Affect
measured
pre-‐
and
post-‐
task
using
the
Posi>ve

e.g.
“determined”,
“a[en>ve” and
Nega>ve
e.g.
“hos>le”,

“afraid”
Affect
Schedule
(PANAS)

•  Focused
a[en>on
measured

with
7-‐item
focused

a4en5on
subscale
e.g.
“I
was
so
involved
in
my
news
tasks
that
I

lost
track
of
>me”,
“I
blocked
things
out
around
me
when
I
was

comple>ng
the
news
tasks”
and
perceived
>me

•  Interest
level
in
topics
(pre-‐task)
and
ques>onnaire

(post-‐task)
e.g.
“I
was
interested
in
the
content
of
the
web

pages”,
“I
wanted
to
find
out
more
about
the
topics
that
I

encountered
on
the
web
pages”

•  189
(90+99)
par>cipants
from
Amazon
Mechanical
Turk

21

PANAS
(10
posi2ve
items
and
10
nega2ve
items)

•  You
feel
this
way
right
now,
that
is,
at
the
present
moment

[1
=
very
slightly
or
not
at
all;
2
=
a
li[le;
3
=
moderately;

4
=
quite
a
bit;
5
=
extremely]

[randomize
items]

distressed,
upset,
guilty,
scared,
hos>le,

irritable,
ashamed,
nervous,
ji[ery,
afraid

interested,
excited,
strong,
enthusias>c,
proud,

alert,
inspired,
determined,
a[en>ve,
ac>ve

(Watson,
Clark
&
Tellegen,
1988)

22

7-‐item
focused
a_en2on
subscale

(part
of
the
31-‐item
user
engagement
scale)

5-‐point
scale
(strong
disagree
to
strong
agree)

1.  I
lost
myself
in
this
news
tasks
experience

2.  I
was
so
involved
in
my
news
tasks
that
I
lost
track
of
>me

3.  I
blocked
things
out
around
me
when
I
was
comple>ng
the

news
tasks

4.  When
I
was
performing
these
news
tasks,
I
lost
track
of

the
world
around
me

5.  The
>me
I
spent
performing
these
news
tasks
just
slipped

away

6.  I
was
absorbed
in
my
news
tasks

7.  During
the
news
tasks
experience
I
let
myself
go

(O'Brien
&
Toms,
2010)
23

Saliency
and
posi2ve
affect

•  When
headlines
are
visually
non-‐salient

• 
users
are
slow
at
finding
them,
report
more

distrac>on
due
to
web
page
features,
and
show
a

drop
in
affect

•  When
headlines
are
visually
catchy
or
salient

• 
user
find
them
faster,
report
that
it
is
easy
to

focus,
and
maintain
posi>ve
affect

•  Saliency
is
helpful
in
task
performance,
focusing/
avoiding
distrac2on
and
in
maintaining
posi2ve

affect

24

Saliency
and
focused
a_en2on

•  Adapted
focused
a[en>on
subscale
from
the

online
shopping
domain
to
entertainment
news

domain

•  Users
reported
“easier
to
focus
in
the
salient

condi>on”
BUT
no
significant
improvement
in
the

focused
a[en>on
subscale
or
differences
in

perceived
>me
spent
on
tasks

•  User
interest
in
web
page
content
is
a
good

predictor
of
focused
a_en2on,
which
in
turn
is
a

good
predictor
of
posi2ve
affect

25

Self-‐repor2ng,
crowdsourcing,
saliency
and

user
engagement

•  Interac>on
of
saliency,
focused
a[en>on,
and
aﬀect,

together
with
user
interest,
is
complex.

•  Using
crowdsourcing
worked!

•  What
next?

•  include
web
page
content
as
a
quality
of
user

engagement
in
focused
a[en>on
scale

•  more
“realis2c”
user
(interac>ve)
reading
experience

•  other
measurements:
mouse-‐tracking,
eye-‐tracking,

facial
expression
analysis,
etc.

(McCay-‐Peet,
Lalmas
&
Navalpakkam,
2012)
26

STUDY
II

•  Domain:
news
and
user
generated
content

(comments)

•  Study:
interes>ngness
and
sen>ment

•  Measurement:
focus
a[en>on,
aﬀect
and

gaze

27

+
Ioannis
Arapakis
+
Barla
Cambazoglu

+
Mari-‐Carmen
Marcos
+
Joemon
Jose

Gaze
and
self-‐repor2ng

•  News
+
comments

•  Sen>ment,
interest

•  57
users
(lab-‐based)

•  Reading
task
(114)

•  Ques>onnaire
(qualita>ve
data)

•  Record
eye
tracking

(quan>ta>ve
data)

Three
metrics:
gaze,
focus

a[en>on
and
posi>ve
aﬀect

28
(Lin
et
al,
2007)

Interes2ng
content
promote
users

engagement
metrics

•  All
three
metrics:

•  focus
a[en>on,
posi>ve
aﬀect
&
gaze

•  What
is
the
right
trade-‐oﬀ?

•  news
is
news
J

•  Can
we
predict?

•  provider,
editor,
writer,
category,
genre,
visual
aids,
…,

sen2mentality,
…

•  Role
of
user-‐generated
content
(comments)

•  As
measure
of
engagement?

•  To
promote
engagement?
29

Lots
of
sen2ments
but
with
nega2ve

connota2ons!

•  Positive effect (and interest, enjoyment and wanted to
know more) correlates
•  Positively (é) with sentimentality (lots of
emotions)
•  Negatively (ê) with positive polarity (happy news)
Sen2Strenght
(from
-‐5
to
5
per
word)

sen>mentality:
sum
of
absolute
values
(amount
of
sen>ments)

polairity:
sum
of
values
(direc>on
of
the
sen>ments:
posi>ve
vs

nega>ve)

(Thelwall,
Buckley
&
Paltoglou,
2012)

30

Eﬀect
of
comments
on
user
engagement

•  6
ranking
of
comments:

•  most
replied,
most
popular,
newest

•  sen>mentality
high,
sen>mentality
low

•  polarity
plus,
polarity
minus

•  Longer
gaze
on

•  newest
and
most
popular
for
interes>ng
news

•  most
replied
and
high
sen>mentality
for
non-‐interes>ng

news

•  Can
we
leverage
this
to
prolong
user
a[en>on?

31

Gaze,
sen2mentality,
interest

•  Interes>ng
and
“a[rac>ve”
content!

•  Sen>ment
as
a
proxy
of
focus
a[en>on,
posi>ve

aﬀect
and
gaze?

•  Next

•  Larger-‐scale
study

•  Other
domains
(beyond
news!)

•  Role
of
social
signals
(e.g.
Facebook,
Twi[er)

•  Lots
more
data:
mouse
tracking,
EEG,
facial
expression

(Arapakis
et
al.,
2013)

32

STUDY
III

•  Domain:
news
and
social
media
(Wikipedia)

•  Study:
interes>ngness,
aesthe>cs,
task

•  Measurement:
focus
a[en>on,
aﬀect
and

mouse
movement

33
+
David
Warnock

Mouse
tracking
and
self-‐repor2ng

•  324
users
from
Amazon
Mechanical
Turk
(between
subject

design)

•  Two
domains
(BBC
News
and
Wikipedia)

•  Two
tasks
(reading
and
search)

•  “Normal
vs
Ugly” interface

•  Ques>onnaires
(qualita>ve
data)

•  focus
a[en>on,
posi>ve
eﬀect,
novelty,

•  interest,
usability,
aesthe>cs

•  +
demographics,
handeness
&
hardware

•  Mouse
tracking
(quan>ta>ve
data)

•  movement
speed,
movement
rate,
click
rate,
pause
length,

percentage
of
>me
s>ll

34

“Ugly” vs
“Normal”
Interface
(BBC
News)

35

Mouse
tracking
can
tell
about

•  Age

•  Hardware

•  Mouse

•  Trackpad

•  Task

•  Searching:
There
are
many
diﬀerent
types
of
phobia.
What

is
Gephyrophobia
a
fear
of?

•  Reading:
(Wikipedia)
Archimedes,
Sec5on
1:
Biography

36

Mouse
tracking
could
not
tell
much
about

•  focused
a[en>on
and
posi>ve
affect

•  user
interests
in
the
task/topic

•  BUT
BUT
BUT
BUT

•  “ugly”
variant
did
not
result
in
lower
aesthe>cs
scores

• 
although
BBC
>
Wikipedia

•  BUT
–
the
comments
le•
…

•  Wikipedia:
“The
website
was
simply
awful.
Ads
flashing
everywhere,
poor

text
colors
on
a
dark
blue
background.”;
“The
webpage
was
en5rely
blue.
I

don't
know
if
it
was
supposed
to
be
like
that,
but
it
definitely
detracted

from
the
browsing
experience.”

•  BBC
News:
“The
website's
layout
and
color
scheme
were
a
bitch
to

navigate
and
read.”;
“Comic
sans
is
a
horrible
font.”

37

Mouse
tracking
and
user
engagement

•  Task
and
hardware

•  Do
we
have
a
Hawthorne
Eﬀect???

•  “Usability” vs
engagement

•  “Even
uglier”
interface?

•  Within-‐
vs
between-‐subject
design?

•  What
next?

•  Sequence
of
movements

•  Automa>c
clustering

(Warnock
&
Lalmas,
2013)

38

STUDY
IV

•  Domain:
news

•  Study:
automa>c
linking

•  Measurement:
interes>ngness

39

+
Ioannis
Arapakis
+Hakan
Ceylan

+
Pinar
Domnez

Automatic
linking
&
reading
experience
40

Keeping
users

reading
more
ar>cles

LEPA: Linker for Events to Past Articles
LEPA is a a fully automated approach to
constructing hyperlinks in news articles
using “simple” text processing and
understanding techniques
Indexer

• Processes
ar>cles
over
a

>me
period
by

extrac>ng
features
from

each
ar>cle
and
storing

them
to
facilitate
faster

retrieval

Linker

•  Iden>ﬁes
sentences
that

contain
newsworthy
events

•  For
each
such
event
it

retrieves
from
the
index
all

the
matching
ar>cles
and

links
the
top-‐ranked
with
the

event
41

Three-stage
evaluation
Pilot
study

Assessing
reading

experience

Assessing
links

+
42

Pilot study
•  Rating results:
•  Bad: 35.15%
•  Fair: 33.93%
•  Good: 20%
•  Excellent: 9.09%
•  Not Judged: 1.81%
• With 63.03% of the links being good:
•  initial evidence that LEPA is not too far from the
optimum achieved by human editors
Professional
editors

A
collec>on
of
system-‐embedded

links
(164
ar>cle-‐link
combina>ons)

5-‐point
Likert
Scale:
(i)
bad,
(ii)
fair,

(iii)
good,
(iv)
excellent,
and
(v)
not

judged

43

Assessing
the
links:
are
they
related?

•  664
par>cipants
recruited
through
Amazon
Mechanical

Turk;
between-‐group
design
(two
groups)

•  Precision
=
frac>on
of
links
(total=164)
that
received,
in

terms
of
relatedness,
a
score
equal
to,
or
greater
than,
3

on
a
5-‐point
Likert
scale

System-Embedded Links Manually-curated Links
Participant
A
Participant
B
All Participant
A
Participant
B
All
Related to the main
theme
49% 42% 45% 54% 51% 53%
Related to subtopic 21% 24% 22% 31% 34% 33%
Tangentially related 13% 15% 14% 9% 12% 10%
Unrelated 15% 16% 16% 5% 1% 3%
Other 2% 2% 2% 1% 2% 1%
44

Assessing
the
Reading
Experience

•  120
par>cipants
recruited
through
Amazom
Mechanical

Turk;
between-‐groups
design
(three
groups)

•  Editors
+
two
opposite
“extremes”
of
LEPA:

•  High
recall:
best
at
embedding
newsworthy
links
&
ar>cles
that

provide
interes>ng
insights

•  High
precision:
best
in
terms
of
embedding
the
right
number
of
links

good
topical

coverage

informa2ve
ness

broader

perspec2ve

interes2ng

insights

good
topical

coverage

link

presenta2on

content

volume

posi2ve

news

reading

experience

45

induc>ve,
thema>c
coding
of

open-‐ended
ques>ons

Automa2c
linking
and
news
reading
experience

•  Even
under
realis>c
and
uncontrolled
condi>ons,

performance
of
LEPA
comparable
to
that
of
editors,

and
in
some
cases
be[er

•  High
precision
vs.
high
recall

•  High
precision
threshold
leads
to
a
be[er
news

reading
experience:
less
is
more

“They
were
too
many,
being
mostly
quite
long,
in
some

cases
more
than
half
the
length
of
the
main
ar5cle,
and

some5mes
they
repeated
the
same
iden5cal
informa5on”

46

STUDY
V

•  Domain:
social
media
(Yahoo!
Answers
and

Wikipedia)

•  Study:
serendipity

•  Measurement:
relevance,
unexpectedness,

interes>ngness

47
+
Ilaria
Bordino
+
Yelena
Mejova

En2ty-‐driven
Exploratory
Search

Linguis-cally
Mo-vated
Seman-c
Aggrega-on
Engines

“transi5on
to
a
truly
seman5c
aggrega5on
paradigm
where

machines
understand
a
user’s
intent,
discover
and
organize
facts,

iden5fy
opinions,
experiences
and
trends”

En>ty

Search

we
build
an
en>ty-‐driven
serendipitous
search
system
based

on
en>ty
networks
extracted
from
Wikipedia
and
Yahoo!

Answers

Serendipity
ﬁnding
something
good
or
useful
while
not

speciﬁcally
looking
for
it,
serendipitous
search

systems
provide
relevant
and
interes>ng
results

48

Yahoo!
Answers

vs

Wikipedia

community-‐driven
ques>on
&

answer
portal

•  67
336
144
ques>ons
&

261
770
047
answers

•  January
1,
2010
–

December
31,
2011

•  English-‐language

community-‐driven

encyclopedia

•  3
795
865
ar>cles

•  as
of
end
of
December

2011

•  English
Wikipedia

curated

high-‐quality
knowledge

variety
of
niche
topics

minimally
curated

opinions,
gossip,
personal
info

variety
of
points
of
view

49

Entity
&
Relationship
Extraction

•  en2ty
–
any
well-‐deﬁned
concept
that
has
a
Wikipedia
page

•  rela2onship
–
a
topical
rela>onship/similarity
between
a
pair
of

en>>es
based
on
document
co-‐occurrence

•  related
to
the
number
of
documents
in
which
the
two
en>>es
occur

50

Dataset
#
Nodes
#
Edges
Density
#
Isolated

Yahoo!
Answers
896,799
112,595,138
0.00028
69,856

Wikipedia
1,754,069
237,058,218
0.00015
82,381

Dataset
Avg
Degree
Max
Degree
Size
of
Largest
CC

Yahoo!
Answers
251
231,921
826,402
(92.15%)

Wikipedia
270
346,070
1,671,241
(95.28%)

Wikipedia

51

Yahoo!
Answers

Retrieval

Wikipedia
Yahoo!
Answers
Combined

Precision
@
5
0.668
0.724
0.744

MAP
0.716
0.762
0.782

Jus>n
Bieber,
Nicki
Minaj,
Katy
Perry,
Shakira,
Eminem,
Lady
Gaga,
Jose

Mourinho,
Selena
Gomez,
Kim
Kardashian,
Miley
Cyrus,
Robert
Pavnson,

Adele
%28singer%29,
Steve
Jobs,
Osama
bin
Laden,
Ron
Paul,
Twi[er,

Facebook,
Ne_lix,
IPad,
IPhone,
Touchpad,
Kindle,
Olympic
Games,
Cricket,

FIFA,
Tennis,
Mount
Everest,
Eiﬀel
Tower,
Oxford
Street,
Nubcrburgring,
Hai>,

Chile,
Libya,
Egypt,
Middle
East,
Earthquake,
Oil
spill,
Tsunami,
Subprime

mortgage
crisis,
Bailout,
Terrorism,
Asperger
syndrome,
McDonal's,
Vitamin
D,

Appendici>s,
Cholera,
Inﬂuenza,
Pertussis,
Vaccine,
Childbirth

3
labels
per
query-‐result
pair

gold
standard
quality
control

Yahoo!
Answers

Jon
Rubinstein

Timothy
Cook

Kane
Kramer

Steve
Wozniak

Jerry
York

Wikipedia

System
7

PowerPC
G4

SuperDrive

Power
Macintosh

Power
Compu>ng
Corp.

Steve
Jobs

•  Annotator
agreement

(overlap):
0.85

•  Average
overlap
in
top

5
results:

<1

52

retrieve
en>>es
most
related
to
a

query
en>ty
using
random
walk

|
relevant
&
unexpected
|
/
|
unexpected
|

number
of
serendipitous
results
out
of
all
of

the
unexpected
results
retrieved
|
relevant
&
unexpected
|
/
|
retrieved
|

serendipitous
out
of
all
retrieved
53

Baseline
Data

Top:
5
en>>es
that
occur
most
frequently
WP
0.63
(0.58)

in
top
5
search
from
Bing
and
Google
YA
0.69
(0.63)

Top
–WP:
same
as
above,
but
excluding

WP
0.63
(0.58)

Wikipedia
page
from
results
YA
0.70
(0.64)

Rel:
top
5
en>>es
in
the
related
query

WP
0.64
(0.61)

sugges>ons
provided
by
Bing
and
Google
YA
0.70
(0.65)

Rel
+
Top:
union
of
Top
and
Rel
WP
0.61
(0.54)

YA
0.68
(0.57)

Serendipity

“making
fortunate
discoveries
by
accident”

Serendipity
=
unexpectedness
+
relevance

“Expected”
result
baselines
from
web
search

Interes2ngness
≠
Relevance

Interes2ng
>
Relevant

Relevant
>
Interes2ng

Oil
Spill

à

Penguins
in
Sweaters

WP

Robert
Pavnson

à

Water
for
Elephants

WP

Lady
Gaga

à

Britney
Spears

WP

Egypt

à

Cairo
Conference

WP

Ne_lix

à

Blu-‐ray
Disc

YA

Egypt

à

Ptolemaic
Kingdom

WP
&
YA

54
(Bordino,
Mejova
&
Lalmas,
2013)

Similarity
(Kendall’s
tau-‐b)
between
result
sets
and
reference
ranking

55

Data
tau-‐b

Which
result
is
more

WP
0.162

relevant
to
the
query?

YA
0.336

If
someone
is
interested
in
the
query,
would

WP
0.162

they
also
be
interested
in
the
result?

YA
0.312

Even
if
you
are
not
interested
in
the
query,

WP
0.139

is
the
result
interes5ng
to
you
personally?

YA
0.324

Would
you
learn
anything
new
about

WP
0.167

the
query
from
the
results

YA
0.307

Following
(Arguello
et
al,
2011)

1.  Labelers
provide
pairwise

comparisons
between
results

2.  Combine
into
a
reference
ranking

3.  Compare
result
ranking
to
op>mal

ranking
using
Kendall’s
tau

Assessing

“interes2ngness”

Multimedia
search
activities often
driven by
entertainment
needs, not by
information
needs
Serendipity
in
multimedia
search?

(Slaney,
2011)

56

What
are
the
ques2ons
to
ask?

•  No
one
measurement
is
perfect
or
complete.

•  All
studies
(process
or
product)
have
diﬀerent

constraints.

•  Need
to
ensure
methods
are
applied
consistently

with
a[en>on
to
reliability:
what
is
a
good

signal?

•  More
emphasis
should
be
placed
on
using
mixed

methods
to
improve
the
validity
of
the

measures.

•  Be
careful
of
the
WEIRD
syndrome
(Western,

Educated,
Industrialized,
Rich,
and
Democra>c)

57

Acknowledgements

•  Collaborators:
Ioannis
Arapakis,
Ilaria
Bordino,
,
Barla

Cambazoglu,
Hakan
Ceylan,
Pinar
Domnez,
Lori
McCay-‐
Peet,
Yelena
Mejova,
Vidhya
Navalpakkam,
David

Warnock,
and
others
at
Yahoo!
Labs.

•  This
talk
uses
some
material
from
a
tutorial
“Measuring

User
Engagement”
given
at
WWW
2013,
Rio
de
Janeiro

(with
Heather
O’Brien
and
Elad
Yom-‐Tov)

Blog:
labtomarket.wordpress.com

58

To be or not be engaged: What are the questions (to ask)?

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à To be or not be engaged: What are the questions (to ask)?

Similaire à To be or not be engaged: What are the questions (to ask)? (20)

Plus de Mounia Lalmas-Roelleke

Plus de Mounia Lalmas-Roelleke (20)

Dernier

Dernier (20)

To be or not be engaged: What are the questions (to ask)?