Tagging gained tremendously in popularity over past few years. When looking into the literature of tagging we find a lot of work regarding people's tagging motivation, their behavior, models that describe the folksonomy generation process, emergent semantic structures, etc., but interestingly we find quite little research showing the value of tags for searching an overloaded information space. Furthermore, there is lot of literature on the tag or item prediction problem, but interestingly almost all of them lookat the issue from a data-driven perspective. To bridge this gap in the literature, we have conducted several in-depth studies in the past showing the value of tags for lookup and exploratory search. We looked at the problem from a network theoretic and interface perspective and we will show how useful tags are for searching. Furthermore, we reviewed literature on memory processes from cognitive science and have invented a number of novel recommender algorithms based on the ACT-R and MINERVA2 theory. We will show that these approaches can not only predict tags and items extremely well, but also reveal how these models can help in explaining the recommendation processes better than current approaches.
From Search to Predictions in Tagged Information Spaces
1. 1
From Search to Predictions in Tagged
Information Spaces
Christoph Trattner
Know-Center
ctrattner@know-center.at
@Graz University of Technology, Austria
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
2. 2
Before start in this presentation I will talk a bit about
myself, my background…
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
3. 3
Where do I come from (Austria)?
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
4. 4
Graz
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
5. 5
Academic Back-Ground?
Studied Computer Science at Graz University of
Technology & University of Pittsburgh
Worked since 2009 as scientific researcher at the KMI &
IICM (BSc 2008, MSc 2009)
My PhD thesis was on the Search & Navigation in Social
Tagging Systems (defended 2012)
Since Feb. 2013 @ Know-Center
Leading the Social Computing Area
At TUG:
WebScience
Semantic Technologies
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
6. 6
My team
2 Post-Docs, 5 Pre-Docs (2 more to join soon )
2 MSc student
2 BSc student
DI. Dieter
Theiler
DI. Dominik
Kowald
Dr. Peter
Kraker
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
Dr. Elisabeth
Lex
Mag. Sebastian
Dennerlein
Mag. Matthias
Rella
DI. Emanuel
Lacic
DI. Ilire Hasani
7. 7
Thanks to my Collaborators
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
8. 8
What is my group doing?
… we research on novel methods and tools that exploit
social data to generate a greater value for the
individual, communities, companies and the society as
whole.
Our competences:
• Network & Web Science
• Science 2.0
• Predictive Modeling
• Social Network Analysis
• Information Quality Assessment
• User Modeling
• Machine Learning and Data Mining
• Collaborative Systems
Our Services:
• Social Analytics: Hub-, Expert -, Community -
, Influencer -, Information Flow-, Trend
(Event) Detection, etc.
• Information Quality Assessment
• Social & Location-based Recommander
Systems
• Customer Segmentation
• Social Systems Design
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
9. 9
Some industry partners...
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
10. 10
Current projects
BlancNoir - “Towards a Big Data recommender engine for offline
and online marketplaces”
I2F - “Towards a Social Media and Online Marketing Manager
Seminar”
Automation-X - “Towards a scalable Graph-based Visual search
solution”
Styria - “Towards a scalable crowd-based hierarchical cluster
labeling approach for willhaben.at”
TripRebel - “Towards an engaging hybrid hotel recommender
solution for triprebel.com”
CDS - “Towards a scalable Entity & Graph-based Visual search
solution for cds.at”
Exthex - “Towards an efficient viral social media marketing
champagne in Facebook and Twitter”
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
11. 11
The Projects
Project 1: Mendeley – UK Startup (recently acquired by Elsevier):
Interested in the problem of hirarchical concept-based search in
tagged information spaces.
Project 2: Tallinn University– Interested in the problem of
recommending tags and items in tagged information spaces.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
12. 12
Ok, let’s start….
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
13. 13
Project 1
Mendeley – UK Startup (recently acquired by Elsevier):
Interested in the problem of hierarchical concept-based
search.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
14. 14
Research Question 1:
What kind of meta-data is more useful for search in
information systems - tags or keywords?
Externals involved:
• Mendeley, London, UK
Helic, D., Körner, C., Granitzer, M., Strohmaier, M. and Trattner, C. 2012. Navigational Efficiency of Broad vs.
Narrow Folksonomies. In Proceedings of the 23rd ACM Conference on Hypertext and Social Media (HT
2012), ACM, New York, NY, USA, pp. 63-72.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
15. 15
Mendeley
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
16. 16
We
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
Tags
Keywords
Mendeley Desktop
17. 17
Task
What is the best way to extract hirarchies from tagged
information spaces? What is more useful for navigation –
keyword or tag hierarchies?
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
18. 18
Different types of hierarchy induction
algorithms
Helic, D., Strohmaier, M., Trattner, C., Muhr M. and Lermann, K.: Pragmatic Evaluation of Folksonomies, In
Proceedings of the 20th international conference on World Wide Web (WWW 2011), ACM, New York, NY, USA,
417-426, 2011.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
19. 19
Issue (!!!)
...no literature on what type of hierarchy is best suited
for searching...
D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and
search in social networks. Science, 296:1302–1305, 2002.
J. M. Kleinberg. Navigation in a small world. Nature,
406(6798):845, August 2000.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
20. 20
Stanley Milgram
A social psychologist
Yale and Harvard University
Study on the Small World Problem,
beyond well defined communities
and relations
(such as actors, scientists, …)
„An Experimental Study of the Small World Problem”
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
1933-1984
21. 21
Set Up
Target person:
A Boston stockbroker
Three starting populations
Nebraska
random
100 “Nebraska stockholders”
96 “Nebraska Nebraska
random”
100 “Boston stockholders
random”
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
Target
Boston
stockbroker
Boston
random
22. 22
Results
How many of the starters would be able to establish
contact with the target?
64 out of 296 reached the target
How many intermediaries would be required to link
starters with the target?
Well, that depends: the overall mean 5.2 links
Through hometown: 6.1 links
Through business: 4.6 links
Boston group faster than Nebraska groups
Nebraska stockholders not faster than Nebraska random
What form would the distribution of chain lengths
take?
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
23. 23
Hierarchical decentralized searcher
Information
Network
Hierarchy
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
24. 24
Results
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
25. 25
Validation
We compared simulations with
human click trails of the online Game –
The Wiki Game (http://thewikigame.com/)
Contains 1,500,000
click trails of more
than 500,000 users with
(start; target) information.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
26. Wikipedia Category Label Dataset:
2,300,000 category labels,
4,500,000 articles, 30,000,000 category
label assignments
26
Hierachy Creation (1)
Two types of hierarchies were evaluated
1.) First type is based on our previous work
Categorial Concepts:
Tags from Delicious
Category labels from Wikipedia
Similarity Graph
Delicious Tag Dataset:
440,000 tags, 580,000 articles and
3,400,000 tag assignments
Latent Hierarchical Taxonomy
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
27. 27
Hierarchy Creation (2)
2.) Second type is based on the work of [Muchnik et al. 2007]
Simple idea: Algorithm iterates through all
links in the network and decides if that link is
of a hierarchical type, in which case it
remains in the network otherwise it is
removed.
Directed link-network dataset of the
English-Wikipedia from February
2012.
All in all, the dataset includes
around 10,000,000 articles and
around 250,000,000 links
Muchnik, L., Itzhack, R., Solomon S. and Louzoun Y.: Self-emergence of knowledge trees: Extraction
of the Wikipedia hierarchies, PHYSICAL REVIEW E 76, 016106 (2007)
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
28. 28
Validation
Human Searchers
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
29. 29
...ok let‘s come back to the Mendeley „problem“...
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
30. 30
Are keyword hierarchies better for search
than social tag hierarchies?
Results:
With simulations we find that tag-based
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
Tags
Keywords
Results: Our Greedy Navigator (= Simulator) needs on average 1-click
more with keywords to reach the target node than with tags
hierarchies are more efficient
for navigation than keywords
31. 31
...ok let‘s move on to some prediction stuff
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
32. 32
Project 2
Tallinn University – Interested in the problem of
recommending items and tags to users in social
tagging systems.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
33. 33
Research Question 2:
To what extent is human cognition theory applicable to
the problem of predicting tags and items to users?
Externals involved:
• PUC - Chile, UFCG – Brazil
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
34. 34
Motivation
They help you to classify Web content better [Zubiaga 2012]
They help people to navigate large knowledge repositories better
[Helic et al. 2012]
They help people to search for information faster [Trattner et al. 2012]
However, there is an issue with social tags…
People are typically lazy to apply social tags(!!)
Zubiaga, A. (2012). Harnessing Folksonomies for Resource Classification. arXiv preprint arXiv:1204.6521.
Trattner, C., Lin, Y. L., Parra, D., Yue, Z., Real, W., & Brusilovsky, P. (2012, June). Evaluating tag-based information
access in image collections. In Proceedings of the 23rd ACM conference on Hypertext and social media (pp. 113-
122). ACM.
Helic, D., Körner, C., Granitzer, M., Strohmaier, M., & Trattner, C. (2012, June). Navigational efficiency of broad vs.
narrow folksonomies. In Proceedings of the 23rd ACM conference on Hypertext and social media (pp. 63-72). ACM.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
35. 35
Motivation
To overcome that issue some smart people started to invent mechanisms that
should help the user in applying tags, known as social tag recommender
system based on:
Collaborative Filtering
User based- and item-based CF [Marinho et al. 2008]
Matrix Factorization
FM, PITF [Rendle et al. 2010, 2011, 2012]
Graph Structures
Adapted PageRank and FolkRank [Hotho et al. 2006]
Topic Models
Latent Dirichlet Allocation (LDA) [Krestel et al. 2009, 2010, 2011]
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
36. 36
Why do we need cognitive models?
First answer: We do not like data data driven approaches…
Me: OK
Second answer: We can understand things better…
…why is something happening and how…
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
37. 37
MINERVA2
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
38. 38
Approach
Based on a Human cognition (derived from MINERVA2 [Kruschke et al., 1992])
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
39. 39
Evaluation
Wikipedia
p-core pruning (p = 14)
To finally measure to performance of our approach we split up our dataset in two
sub-sets 80% for training and 20% for testing Training
Precision, Recall, F1-score, MRR, MAP
As Baseline algorithm we have chosen Latent Dirichlet Allocation (LDA)
[Krestel et al. 2009]
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
40. 40
Results
Results:
3Layers reaches higher levels of
estimate than the pure LDA
approach.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
41. 41
ACT-R
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
42. 42
ACT-R
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
43. 43
Interestingly, when looking into the literatur of tagging
systems - temporal processes are typically modeled
with an exponential function...
D. Yin, L. Hong, and B. D. Davison. Exploiting session-like behaviors in tag prediction. In
Proceedings of the 20th international conference companion on World wide web, pages
167–168. ACM, 2011.
L. Zhang, J. Tang, and M. Zhang. Integrating temporal usage pattern into personalized tag
prediction. In Web Technologies and Applications, pages 354–365. Springer, 2012
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
44. Linear distribution with log-scale
44
Empirical Analysis: BibSonomy (1)
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
44
on Y-axis
exponential function
Linear distribution with log-scale
on X- and Y-axes
power function
45. 45
Empirical Analysis: BibSonomy (2)
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
45
Exponential distribution
R² = 31%
Power distribution
R² = 89%
46. 46
Results:
Decay factor is better modeled as
power-function rather than an ex-function
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
47. 47
Experiment 1: Predicting re-use of tags
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
48. 48
Results: Predicting re-use of tags
BLLAC
BLL
MPU
GIRP
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
49. 49
Results: Recall / Precision
Results:
BLLAC performs fairly well in
predicting the re-use of tags
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
50. 50
Experiment 2: Recommending Tags
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
51. 51
Results: Recall-Precision plots
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
51
The time-depended
approaches outperform the
state-of-the-art
BLL+MPr reaches the
highest level of accuracy
CiteULike
52. BLL approaches outperform current
state-of-the-art tag recommender
approaches.
52
Results: Recall Precision
Results:
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
53. 53
...how about runtime?
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
54. 54
Results: Runtime
BLL+C needs only around 1s to generate tag-recommendations
for 5,500 users in BibSonomy
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
55. 55
Results: Runtime
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
57. 57
Our Approach
= CIRTT 2 main steps
First step:
– User-based Collaborative Filtering (CF) to get
candidate items of similar users
Second step:
– Item-based CF to rank these candidate items using
the BLL equation to integrate tag and time
information:
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
57
58. IR metrics: nDCG@20, MAP@20, Recall@20, Diversity and
58
How does it perform?
3 freely-available folksonomy datasets
– BibSonomy (~ 340,000 tag assignments)
– CiteULike (~ 100.000 tag assignments)
– MovieLens (~ 100.000 tag assignments)
Original datasets (no p-core pruning) Doerfel et al. (2013)
80/20 split (for each user 20% most recent bookmarks/posts
in test-set, rest in training-set)
User Coverage
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
58
59. 59
Baseline Methods
• Most Popular (MP)
• User-based Collaborative Filtering (CF)
• Two alternative approaches based on tag and time
information
– Zheng et al. (2011) exponential function
– Huang et al. (2014) linear function
(remember: our CIRTT uses a power function)
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
59
60. 60
Results: nDCG plots
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
60
CIRTT reaches the highest level of accuracy
61. 61
Results: Recall plots
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
61
CIRTT reaches the highest level of accuracy
62. CIRTT works quite well compared to
the current state-of-the-art in tag-based
62
Results
Results:
item recommender systems
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
63. 63
What are we...
...currently working on...
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
64. 64
MINERVA2 + ACT-R
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
65. 65
Time in Semantic vs. Lexical Memory
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
66. 66
Topical vs. Lexical shift in time
Topics
Tags
Results:
Topical shift in time is less
pronounced than lexical shift
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
67. 67
Results: Recall / Precision
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
68. 68
Describer vs. Categorizer
M. Strohmaier, C. Koerner, and R. Kern. Understanding why users tag: A survey of tagging motivation
literature and results from an empirical study. Journal of Web Semantics, 17:1–11, 2012.
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
69. 69
Results: Categorizer vs. Describer
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
70. 70
... ok that‘s basically it
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
71. 71
Code and Framework
https://github.com/learning-layers/TagRec/
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona
72. 72
Thank you!
Christoph Trattner
Email: trattner.christoph@gmail.com
Web: christophtrattner.info
Twitter: @ctrattner
Sponsors:
. Christoph Trattner 30.10.2014 – Yahoo! Labs, Barcelona