My keynote at 1st International Workshop on Social Multimedia Computing (SMC), Melbourne, Australia, 9 July 2012.
see: http://www.icme2012.org or
http://smc2012.idm.pku.edu.cn/
1. 1
Security and Trust in social media networks
Prof. Touradj Ebrahimi
touradj.ebrahimi@epfl.ch
The First International Workshop on Social Multimedia Computing (SMC2012)
Melbourne, Australia
9 July 2012
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
2. Social media landscape 2012 2
http://www.fredcavazza.net/2012/02/22/social-media-landscape-2012/
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
3. Popularity of social networks 3
March 2012
May 2012
http://blog.tweetsmarter.com/social-media/spring-2012-social-media-user-statistics/
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
4. Popularity of social networks (cont.)
1+ billion photos
7+ billion photos
~140 billion photos
72 hours of video
every minute
April 2012
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
5. Popularity of social networks (cont.) 5
February 2012
http://www.trendweek.com/en/60-seconds-of-social-media-sharing/
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
6. Popularity of social networks (cont.) 6
January 2012
http://online.wsj.com/article/SB10001424052970204653604577249341403742390.html
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
7. Trust & security issues in social networks 7
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
8. Trust & security issues in social networks (cont.) 8
• Socialcam application automatically shares with your
Facebook friends videos you watch
• May be doing you more harm than good!
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
9. Trust & security issues in social networks (cont.) 9
• Spotify and Yahoo News Activity automatically publish
songs and news you have listened to or read on your profile
page
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
10. Trust & security issues in social networks (cont.) 10
• “Responsible” sharing (e.g. a friend posted an
embarrassing photo of a young political candidate Emma
Kiernan enjoying a party)
Anybody can post any
picture of you on
Facebook at any time!
Even if deleted, it is
there forever, stored in
the vast Internet
memory bank to be
found again and again!
http://newsbizarre.com/2009/05/emma-kiernan-facebook-photo-puts-young.html
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
11. Trust & security issues in social networks (cont.) 11
• “Responsible” sharing (e.g. top model Rosanagh Wypych
posted photos of her and friends drinking alcohol and
consuming cannabis)
http://www.3news.co.nz/Top-Model-Rosanagh-worried-after-Facebook-pot-pic/tabid/418/articleID/224047/Default.aspx
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
12. Trust & security issues in social networks (cont.) 12
• Wrong tags in Flickr
Challenges:
• Identify the most
appropriate tags
• Eliminate noisy
or spam tags
Only around 50%
of tags are truly
related to an image
[Kennedy et al., ACM MIR 2006]
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
13. Trust & security issues in social networks (cont.) 13
• Spam bookmarks in Delicious
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
14. Trust & security solutions in social networks 14
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
15. Trust & security solutions in social networks (cont.) 15
• Limit number of people to share content and
communications with – create private groups
(e.g. Google+ Circles, Facebook Smart Lists)
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
16. Trust & security solutions in social networks (cont.) 16
• Limit time to watch shared content
(e.g. SnapChat application – though they are
not able to guarantee that your messaging data
will be deleted in all instances)
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
17. Trust & security solutions in social networks (cont.) 17
• Manage your account (friends, photos, comments, posts)
• Make sure that what you are sharing or posting is not going
to cause regret
http://www.huffingtonpost.com/2012/03/20/social-media-privacy-infographic_n_1367223.html
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
18. Trust & security before social media 18
• E-mail
• Web search
• Web videos
• Blogs
• Online shopping systems
• Peer-to-peer (P2P) networks
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
19. Anti-spam strategies in online systems 19
[Heymann, Koutrika, Garcia-Molina, IEEE IC 2007]
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
20. Trust and reputation online systems 20
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
21. Trust and reputation online systems (cont.) 21
Email spam filtering
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
22. Model of social tagging system 22
[Ivanov, Vajda, Lee, Ebrahimi, IEEE SPM 2012]
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
23. Categorization of trust models 23
[Ivanov, Vajda, Lee, Ebrahimi, IEEE SPM 2012]
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
24. User versus content trust modeling 24
• User trust modeling is more popular than content trust
modeling:
• User trust models has a less complexity than content trust models
• User trust models can quickly adapt to the constantly evolving and
changing environment in social systems due to the type of features
used for modeling, and thus be applicable longer than content trust
models, without need for creation of new models
• User trust modeling has a disadvantage of “broad brush”,
i.e. it may be excessively strict if a user happens to post one
bit of questionable content on otherwise legitimate content
• “Subjectivity” in classifying spam and non-spam content/
users remains as a fundamental issue, i.e. what is spam
content/user to one person may be interesting to another one
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
25. Summary of representative recent techniques 25
Reference Trust Media Method Dataset
model
Gyongyi et content web pages an iterative approach, called TrustRank, to propagate AltaVista,
al. 2004 trust scores to all nodes in the graph by splitting the real
trust score of a node among its neighbors according
to a weighting scheme
Koutrika et content bookmarks a coincidence-based model for query-by-tag search Delicious,
al. 2008 which estimates the level of agreement among real & simulated
different users in the system for a given tag
Wu et al. content images a computer vision technique based on low-level SpamArchive
2005 image features to detect embedded text and & Ling-Spam,
computer-generated graphics real
Liu et al. content & bookmarks an iterative approach to identify spam content by its Delicious,
2009 user information value extracted from the collaborative real
knowledge
Bogers and content & bookmarks KL-divergence to measure the similarity between BibSonomy &
Van den user language models and new posts CiteULike,
Bosch 2008 real
Ivanov et al. user images an approach based on the feedback from other users Panoramio,
2012 who agree or disagree with a tag associated with an real
image
[Ivanov, Vajda, Lee, Ebrahimi, IEEE SPM 2012]
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
26. Summary of representative recent techniques (cont.) 26
Reference Trust Media Method Dataset
model
Xu et al. user bookmarks an iterative approach to compute the goodness of MyWeb 2.0,
2006 each tag with respect to a content and the authority real
scores of the users
Krestel and user bookmarks a TrustRank-based approach using features which BibSonomy,
Chen 2008 model tag co-occurance, content co-occurance and real
co-occurance of tag-content
Benevenuto user videos a supervised learning approach applied on features YouTube,
et al. 2009 that reflect users behavior through video responses real
Lee et al. user tweets a machine learning approach applied on social Twitter,
2010 honeypots including users’ profile and tweets’ real & simulated
features
Krause et al. user bookmarks a machine learning approach applied on a user’s BibSonomy,
2008 profile, bookmarking activity and context of tags real
features
Markines et user bookmarks a machine learning approach applied on tag-, BibSonomy,
al. 2009 content- and user-based features real
Caverlee et user user an approach to compute a dynamic trust score, MySpace,
al. 2008 profiles called SocialTrust, depending on the quality of real
the relationship and personalized feedback ratings
[Ivanov, Vajda, Lee, Ebrahimi, IEEE SPM 2012]
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
27. User trust modeling 27
• User reliability based model [Ivanov et al. 2012]
• User trust modeling in Panoramio images
Tagged incorrectly?
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
28. User trust modeling 28
• User reliability based model (cont.) [Ivanov et al. 2012]
• Idea:
true flag (correct tag)
false flag (wrong tag + suggest a correct tag)
• The more misplacements a user has, the more untrusted he/she is
• User trust ratio:
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
29. User trust modeling 29
• User reliability based model (cont.) [Ivanov et al. 2012]
• Users can collaboratively eliminate a spammer
Bob
Eve
Alice
Bob Alice Eve
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
30. Example: Geotag propagation with trust modeling 30
• Geotag propagation = geotagging
• GPS coordinates are not available for the majority of web
photos
• GPS sensor in a camera provides only the location of the
photographer instead of that of a captured landmark
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
31. Example: Geotag propagation with trust modeling 31
• Sometimes GPS and Wi-Fi geotagging determine
wrong location due to noise in communication channel
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
32. Example: Geotag propagation with trust modeling 32
[Ivanov, Vajda, Lee, Ebrahimi, MTAP 2012]
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
33. Geotag propagation with trust modeling
Bob Alice Eve
Alice propagate
tags
Bob Eve
propag
ate
tags
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
http://www.flickr.com/photos/gustavog/9708628
http://www.flickr.com/photos/gustavog/9708628
34. Performance evaluation 34
• Dataset
• 1320 images ‒ 22 cities, 3 sublocations for each city, 20 sample
images for each sublocation
• 47 users; 3295 tags (658 unique)
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
35. The distribution of the normalized trust values 35
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
36. The distribution of the normalized trust values for all users 36
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
37. The recognition rate of the geotag propagation system 37
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
38. Open issues & challenges 38
• Publicly available datasets:
• Publication of datasets from different social networks and even
different datasets of one social network for evaluation of trust
modeling approaches is rarely found, which makes it difficult to
compare results and performance of
different trust models
• Most of the datasets provide data for evaluating
only one aspect of trust modeling, either
user or content trust modeling, while evaluation
of the other aspect requires introducing
simulated objects in the real-world social
tagging datasets
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
39. Open issues & challenges (cont.) 39
• Dynamics of trust:
• User’s trust tends to vary over time according to the user’s
experience and evolvement of social networks
• Only a few approaches deal with dynamics of trust by distinguishing
between recent and old tags
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
40. Open issues & challenges (cont.) 40
• Multilingualism:
• Most of the existing trust modeling approaches based on text
information assume monolingual environments
• Some text information may be regarded as wrong due to the
language difference – people from various countries, so various
languages simultaneously appear in tags and comments
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
41. Open issues & challenges (cont.) 41
• Interaction across social networks:
• How trust models across domains can be effectively connected
and shared? E.g. users can use their Facebook accounts to log in
some other social network services
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
42. Open issues & challenges (cont.) 42
• Multimodal analysis:
• Most of the current techniques for noise and spam reduction focus
only on textual tag processing and user profile analysis, while
audio and visual content features of multimedia content can also
provide useful information about the relevance of the content and
content-tag relationship
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
43. References 43
B. Sigurbjornsson, and R. van Zwol, "Flickr tag recommendation based on collective knowledge," in Proc.
WWW, Apr. 2008, pp. 327–336.
L. S. Kennedy, S.-F. Chang, and I. V. Kozintsev, “To search or to label?: Predicting the performance of search-
based automatic image classifiers,” in Proc. ACM MIR, Oct. 2006, pp. 249–258.
K. Liu, B. Fang, and Y. Zhang, “Detecting tag spam in social tagging systems with collaborative knowledge,” in
Proc. IEEE FSKD, Aug. 2009, pp. 427–431.
P. Heymann, G. Koutrika, and H. Garcia-Molina, “Fighting spam on social web sites: A survey of approaches
and future challenges,” IEEE Internet Comput., vol. 11, no. 6, pp. 36–45, Nov. 2007.
X. Li, C. Snoek, and M. Worring, "Learning tag relevance by neighbor voting for social image retrieval," in
Proc. ACM MIR, Oct. 2008, pp. 180-187.
B. Markines, C. Cattuto, and F. Menczer, “Social spam detection,” in Proc. ACM AIRWeb, Apr. 2009, pp. 41–
48.
B. Krause, C. Schmitz, A. Hotho, and Stum G., “The anti-social tagger: Detecting spam in social bookmarking
systems,” in Proc. ACM AIRWeb, Apr. 2008, pp. 61–68.
G. Koutrika, F. A. Effendi, Z. Gyongyi, P. Heymann, and H. Garcia-Molina, “Combating spam in tagging
systems: An evaluation,” ACM TWEB, vol. 2, no. 4, pp. 22:1–22:34, Oct. 2008.
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen, “Combating web spam with TrustRank,” in Proc. VLDB, Aug.
2004, pp. 576–587.
C.-T. Wu, K.-T. Cheng, Q. Zhu, and Y.-L. Wu, “Using visual features for anti-spam filtering,” in Proc. IEEE ICIP,
Sep. 2005, vol. 3, pp. III – 509–512.
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
44. References (cont.) 44
I. Ivanov, P. Vajda, J.-S. Lee, and T. Ebrahimi, "In tags we trust: Trust modeling in social tagging of multimedia
content," in IEEE SPM, vol. 29, no. 2, pp. 98-107, Mar. 2012.
I. Ivanov, P. Vajda, J.-S. Lee, L. Goldmann, and T. Ebrahimi, “Geotag propagation in social networks based on
user trust model,” MTAP, vol. 56, no. 1, pp. 155-177, Jan. 2012.
I. Ivanov, P. Vajda, L. Goldmann, J.-S. Lee, and T. Ebrahimi, "Object-based tag propagation for semi-
automatic annotation of images," in Proc. ACM MIR, Mar. 2010, pp. 497-506.
T. Bogers and A. Van den Bosch, “Using language models for spam detection in social bookmarking,” in Proc.
ECML PKDD, Sep. 2008, pp. 1–12.
Z. Xu, Y. Fu, J. Mao, and D. Su, “Towards the semantic web: Collaborative tag suggestions,” in Proc. ACM
WWW, May 2006.
R. Krestel and L. Chen, “Using co-occurence of tags and resources to identify spammers,” in Proc. ECML
PKDD, Sep. 2008, pp. 38–46.
F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Gonc¸alves, “Detecting spammers and content
promoters in online video social networks,” in Proc. ACM SIGIR, Jul. 2009, pp. 620–627.
K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: social honeypots + machine learning,” in
Proc. ACM SIGIR, Jul. 2010, pp. 435–442.
M. G. Noll, C. A. Yeung, N. Gibbins, C. Meinel, and N. Shadbolt, “Telling experts from spammers: Expertise
ranking in folksonomies,” in Proc. ACM SIGIR, Jul. 2009, pp. 612–619.
J. Caverlee, L. Liu, and S. Webb, “SocialTrust: Tamper-resilient trust establishment in online communities,” in
Proc. ACM JCDL, Jun. 2008, pp. 104–114.
A. Hotho, D. Benz, R. J¨aschke, and B. Krause, Eds., ECML PKDD Discovery Challenge, Sep. 2008,
Available at: http://www.kde.cs.uni-kassel.de/ws/rsdc08.
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
45. Any question?
Prof. Touradj Ebrahimi
Touradj.Ebrahimi@epfl.ch
MMSPG EPFL
Contribution from Ivan
Ivanov who diligently
prepared many of materials
discussed is hereby
acknowledged.
http://nokiaconnects.com/2012/03/22/photo-mad-why-are-we-so-obsessed-with-taking-pictures/\nhttp://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox\nhttp://www.youtube.com/t/press_statistics\nTags, when combined with search technologies, are essential in resolving user queries targeting shared content.\n
However, The countermeasures developed for the e-mail and web spam do not directly apply to social networks\n\n\nhttp://blog.jungletechnology.com/wp-content/uploads/2012/01/spam.jpg\nhttp://gigaom.com/video/estimate-20-of-web-videos-are-spam/\nhttp://www.findmysoft.com/img/news/inside/Kanye-West-Dea\nhttp://www.free-web-submission.co.uk/blog/wp-content/uploads/2011/08/blog-comment-spam.jpgd_1256213554.png\n
In order to reduce or eliminate spams, various anti-spam methods have been proposed in the state-of-theart\nresearch. Heymann et al. [6] classified anti-spam strategies into three categories: prevention, detection, and\ndemotion. Prevention-based approaches aim at making it difficult for spam content to contribute to social tagging\nsystems by restricting certain access types through interfaces (such as CAPTCHA [7] or reCAPTCHA [8]) or\nthrough usage limits (such as tagging quota, e.g., Flickr introduced a limit of 75 tags per photo [9]). Detection\napproaches identify likely spams either manually or automatically by making use of, for example, machine\nlearning (such as text classification) or statistical analysis (such as link analysis), and then deleting the spam\ncontent or visibly marking it as hidden to users. Finally, demotion-based approaches reduce the prominence of\ncontent likely to be spam. For instance, rank-based methods produce ordering of a system’s content, tags or users\nbased on their trust scores. The prevention-based approaches can be considered as a type of precaution to prevent\nspammers. However, they cannot completely secure a social tagging system. Some studies, e.g. [10], showed that\nCAPTCHA systems can be defeated by computers with around 90% of accuracy, using, for example, optical\ncharacter recognition or shape context matching. Even if prevention methods were perfect, there would be still\npossibility that the social systems get polluted with spam (malicious) or irrelevant tags. Therefore, detection and\ndemotion via trust modeling are required to keep a system free of noise and spam.\n
http://www.wpconfig.com/2010/07/04/tips-to-increase-your-pagerank\nSome examples of trust and reputation systems:\nGoogle’s web page rank – each web page is ranked depending on the number of links which are pointing at it from different sites\nAmazon – members can submit a review and a rating in the range 1 to 5 stars, users can vote on reviews as being helpful or not helpful; disadvantage: there are many reports of attacks since anybody can vote without becoming a member\n
http://www.spam-filter-express.com/images/bayesian.gif\nSome examples of trust and reputation systems:\neBay – buyer and seller rate each other as positive, negative or neutral after completion of a transaction; even if it is very primitive, studies show that ratings on eBay have surprisingly positive influence\nEmail spam filtering – Naïve Bayesian classifier that learns the probability of each word within a training set of spam and non-spam emails, and then uses these probabilities to calculate the probability that a testing email is spam, which depends on whether it contains some particular words or not\n
The countermeasures developed for the e-mail and web spam do not directly apply to social networks\n\nIn a social tagging system, spam or noise can be injected at three different levels: spam content, spam tagcontent\nassociation and spammer [18]. Trust modeling can be performed at each level separately (e.g. [18]) or\ndifferent levels can be considered jointly to produce trust models, for example, to assess a user’s reliability, one\ncan consider not only the user profile, but also the content that the user uploaded to a social system (e.g. [19]).\nIn this paper, we categorize trust modeling approaches into two classes according to the target of trust, i.e. user\nand content trust modeling (shown in Figure 3)\n
a user’s trust in a social tagging system is dynamic, i.e. it changes over time. The tagging history of a user is better to\nconsider, because a consistent good behavior of a user in the past can suddenly change by a few mistakes, which\nconsequently ruins his/her trust in tagging.\n
It is noticed that user trust modeling is more popular than content trust modeling. One reason of this is that\nthe former has a less complexity when compared to the latter, i.e. the number of models required is usually much\nlarger in content trust modeling. The other reason is that user trust models can quickly adapt to the constantly\nevolving and changing environment in social systems due to the type of features used for modeling, and thus be\napplicable longer than content trust models, without need for creation of new models. On the other hand, user\ntrust modeling has a disadvantage of “broad brush”, i.e. it may be excessively strict if a user happens to post one\nbit of questionable content on otherwise legitimate content. Trustworthiness of a user is often judged based on\nthe content that the user uploaded to a social system, and thus “subjectivity” in discriminating spammers from\nlegitimate users remains an issue for user trust modeling as in content trust modeling.\n
Table I summarizes representative recent approaches for trust\nmodeling in social tagging. Presented approaches are sorted based on their complexity from simple to advanced,\nseparately for both content and user trust models.\n
Table I summarizes representative recent approaches for trust\nmodeling in social tagging. Presented approaches are sorted based on their complexity from simple to advanced,\nseparately for both content and user trust models.\n
\n
In this section, we describe our own approach for user trust modeling in image tagging, which was\nproposed in Ivanov et al. [8]. First, we evaluate the trust or reliability of users by making use of their\npast behavior in tagging. We want to distinguish between users who provide reliable geotags, and those\nwho do not. After user evaluation and trust model creation, tags will be propagated to other photos in\nthe database only if the user is trusted. Assuming that there are L users who tag M training images, a\nmatrix R(i; u), i 2 [1 : : :M] and u 2 [1 : : :L], is defined as:\nThe process of comparing the propagated tags to ground truth tags can be done automatically using\ntag similarity measures, for example WordNet [26] or Google distance [27] measures. Nevertheless, we\nconsidered only manually defined ground truth for our experiments.\nA trust value for user u, trustIvanov(u), is computed as the percentage of the correctly tagged images\namong all images tagged by user u:\nOnly tags from users who are trusted are propagated to other photos in the dataset. In other words, if the\nuser trust value trustIvanov(u), exceeds a predefined threshold ^ T, then all his/her tags are propagated.\nOtherwise, none of his/her tags are propagated.\n\n\nThree steps:\nUser evaluation – comparing the predicted tags to manually defined ground truth (for a real photo sharing system, such as Panoramio, it is not necessary to collect ground truth data since user feedback can replace them):\nTrust model creation – the percentage of the correctly tagged images by particular user out of the overall number of tagged images M:\nTag propagation – only tags from users who are trusted are propagated to other photos in the dataset\n\nIn this approach, ground truth data are used for the estimation of the user trust value. However, for a\npractical photo sharing system, such as Panoramio, it is not necessary to collect ground truth data since\nuser feedback can replace them. The main idea is that users evaluate tagged images by assigning a true\nor a false flag to the tag associated with an image. If the user assigns a false flag, then he/she needs\nto suggest a correct tag for the image. The more misplacements a user has, the more untrusted he/she\nis. By applying this method, spammers and unreliable users can be efficiently detected and eliminated.\nTherefore, the user trust value is calculated as the ratio between the number of true flags and all associated\nflags over all images tagged by that user. The number of misplacements in Panoramio is analogous to\nthe number of wrongly tagged images in our approach.\n
In case that a spammer attacks the system, other users can collaboratively eliminate\nthe spammer. First, the spammer wants to make other users untrusted, so he/she assigns\nmany false flags to the tags given by those other users and sets new, wrong, tags to these\nimages. In this way, the spammer becomes trusted. Then, other users correct the tags given\nby the spammer, so that the spammer becomes untrusted and all of his/her feedbacks in the\nform of flags are not considered in the whole system. Finally, previously trusted users, who\nwere untrusted due to spammer attack, recover their status. Following this scenario, the user\ntrust ratio can be constructed by making use of the feedbacks from other users who agree\nor disagree with the tagged location. However, due to the lack of a suitable dataset which\nprovides user feedback, the evaluation of the user trust scenario is based on the simulation\nof the social network environment.\n
http://www.placecast.net\n
http://www.placecast.net\n
In this section, an example system using trust modeling is described in order to demonstrate how a social\ntagging system can benefit from trust modeling. Particularly, the geotagging scenario is considered, which is\nnowadays a very popular application due to the fact that a large portion of Internet images in social networks are\nrelated to travel. Travel is an important type of event for which people like to share, annotate and search pictures.\nFor the majority of travel images on the Internet, however, proper geographical annotations are not available. In\nmost cases, the images are annotated by users manually. In order to speed up this time-consuming manual tagging\nprocess, geotags can be propagated based on the similarity between image content (usually famous landmarks)\nand context (associated geotags).\nIvanov et al. [24] developed an efficient system for automatic geotag propagation in images by associating\nlocations with distinctive landmarks and using object duplicate detection. The system overview is shown in Figure\n4. The robust graph-based object duplicate detection approach reliably establishes the correspondence between a\nsmall set of tagged images and a large set of untagged images by searching for the same landmark depicted in\ndifferent images, in order to propagate geotags from the former to the latter. In tag propagation, trust modeling\nis especially crucial because propagating a wrong/spam tag can easily damage the integrity and reliability of the\nwhole system. A user trust modeling derived for each user is introduced in the geotagging system of [24] by\nmaking use of the feedbacks from other users who agree or disagree with a tag associated with an image, so\nIn this section, an example system using trust modeling is described in order to demonstrate how a social\ntagging system can benefit from trust modeling. Particularly, the geotagging scenario is considered, which is\nnowadays a very popular application due to the fact that a large portion of Internet images in social networks are\nrelated to travel. Travel is an important type of event for which people like to share, annotate and search pictures.\nFor the majority of travel images on the Internet, however, proper geographical annotations are not available. In\nmost cases, the images are annotated by users manually. In order to speed up this time-consuming manual tagging\nprocess, geotags can be propagated based on the similarity between image content (usually famous landmarks)\nand context (associated geotags).\nIvanov et al. [24] developed an efficient system for automatic geotag propagation in images by associating\nlocations with distinctive landmarks and using object duplicate detection. The system overview is shown in Figure\n4. The robust graph-based object duplicate detection approach reliably establishes the correspondence between a\nsmall set of tagged images and a large set of untagged images by searching for the same landmark depicted in\ndifferent images, in order to propagate geotags from the former to the latter. In tag propagation, trust modeling\nis especially crucial because propagating a wrong/spam tag can easily damage the integrity and reliability of the\nwhole system. A user trust modeling derived for each user is introduced in the geotagging system of [24] by\nmaking use of the feedbacks from other users who agree or disagree with a tag associated with an image, so\n
\n
A new dataset was created in order to evaluate the proposed geotag propagation method. We\nare interested in images that depict geographically unique landmarks. For instance, pictures\ntaken by tourists are ideal because they often focus on the unique and interesting landmarks\nof a place. The dataset is obtained from Google Image Search, Flickr and Wikipedia by\nquerying the associated tags for famous landmarks.\nThe resulting dataset consists of 1320 images: 22 cities (such as Amsterdam, Barcelona,\nLondon, Moscow, Paris) and 3 landmarks for each of them (objects or areas in those cities,\nsuch as Bird’s Nest Stadium, Sagrada Familia, Reichstag, Golden Gate Bridge, Eiffel Tower).\nEach landmark has 20 image samples. Fig. 3 shows a single image for a single landmark\nfrom each of the 22 considered cities, while Fig. 4 provides several images for 3 selected\nlandmarks (e.g. Berlin - Reichstag, San Francisco - Golden Gate Bridge and Paris - Eiffel\nTower). As can be seen from these samples, images with a large variety of view points and\ndistances are considered for each landmark. Fig. 5 provides all cities and landmarks from\nour dataset.\n
To compare different user trust models, we first analyze the distribution of their trust values given the\nmanually assigned tags by the human participants (see Section V-B for more details). The values for\neach trust model were computed as described in Section III. Obtained user trust values were normalized\nto 1. Then, the trust values were split into five equally distributed histogram bins with the following\nranges: 0 􀀀 0:2, 0:2 􀀀 0:4, 0:4 􀀀 0:6, 0:6 􀀀 0:8, and 0:8 􀀀 1. Figure 7 shows the distribution of the total\nnumber of users with trust values in different bins for each of the trust model. From the results, it can be\nnoted that the distributions for most of the user trust models are not uniformed. However, the tags to our\ndataset assigned by the human participants can be regarded following a uniform distribution, assuming,\nparticipants unbiasedly tagged the depicted generally well-known landmarks. Therefore, useful, adequate,\nand practical user trust model should also reflect this uniformity in the gathered tags from participants.\nFrom Figure 7, we can notice that only two out of five compared user trust models, Koutrika et al. [21]\nand Ivanov et al. [8], demonstrate the uniformity in their assignment of the trust values to the participated\nusers, while the rest of the models mark majority of the users as untrusted.\n
To understand the reasons for such bias in some of the user trust models, we plotted user trust values\ngiven by each approach to all users in Figure 8. The figure shows that trust values vary considerably\nbetween different users, but also across different models. For example, trust values of the users enumerated\nwith 2, 16 and 47 span almost the entire range of the normalized trust value, namely, from 0 to 1, for\nall selected models. Since it is difficult to compare selected trust models for each user separately, we\nalso groupped the participants according to their background. In our experiments, users are split into 6\ndifferent categories: researchers (13 users), architects (7 users), engineers (12 users), doctors (4 users),\nhigh school students (2 users), and others (9 users who did not indicate their background). \n
To analyze the performance of the studied user trust models in tags propagation scenario, we plot\nthe accuracy of propagation vs. the number of tags in Figure 10. We vary the number of propagated\ntags by adjusting the threshold on what is the acceptable user trust value of each model. The accuracy\nof the tags propagation is computed as the ratio of the correct (based on the ground truth) tags to the\ntotal number of tags assigned (propagated) to images. The maximum number of propagated tags can be\nmuch higher than the number of images, since several tags can be assigned to an image by different\nusers. Each tag is propagated to different images. Therefore the curves in the Figure 10 show a trade-off\nbetween propagating tags to more images but less accurately and propagating tags to less images but\nmore accurately. The black marker indicates the average tagging accuracy of the system when neither\nthe user trust model nor automated tag propagation is used. In our experiments, it corresponds to users\nassigning 47 66 = 3102 tags to images (47 users in our experiments with each of them tagging 66\nimages at least once). The resulted average tagging accuracy is 52%. This accuracy is equivalent to what\ncurrently Flickr or Panoramio have, where users simply tag photos independently with no propagation\nused.\nHowever, by using automated tag propagation that relies on the trust model based on user feedback,\nwe can improve the accuracy of the tagging system and propagate more tags to the untagged images\nin the dataset. This improvement is illustrated by the left part of the blue curve (our method), which is\nabove the average user trust value of 52%. It means that more than 6600 tags (see Figure 10) can be\npropagated, twice more than without a trust model, from the trusted users, while keeping accuracy higher\nabove 52%. Other trust-based methods, such as by Koutrika et al., also perform well, though, they show\nless impressive results than the tag propagation based on our user trust model. However, the advantage\nof the algorithm by Koutrika et al. is that it is simple and does not need any ground truth or seed data.\nMethods of Liu et al., Xu et al., and Krestel and Chen are not able to perform well in the tag propagation\nscenario. Our method showed good performance in this simulated social network environment, since the\nalgorithm includes users’ tagging behavior through feedback from other people as an important factor in\ncalculating trust value, rather than simply relying on the user contributed tags.\n
There have been a variety of datasets from different social networks and even different datasets of one social\nnetwork for evaluation of trust modeling approaches, as shown in Section III. However, publication of such\ndatasets is rarely found, which makes it difficult to compare results and performance of different trust modeling\napproaches. Therefore, it would be desirable to promote researchers to make their datasets publicly available to the\nresearch community, which can be used for comparison and benchmarking of different approaches. Furthermore,\nmost of the datasets provide data for evaluating only one aspect of trust modeling, either user or content trust\nmodeling, while evaluation of the other aspect requires introducing simulated objects in the real-world social\ntagging datasets (e.g. [20], [29]). However, for the thorough evaluation of a trust model it is necessary that\nreal-world datasets have ground-truth data for both users and content.\n\nhttp://www.flickr.com/photos/lbi-belgium/5925938096\n
We already noted that a user’s trust tends to vary over time according to the user’s experience and evolvement\nof social networks. However, only a few approaches (e.g. [29], [30]) deal with dynamics of trust by distinguishing\nbetween recent and old tags. Future work considering dynamics of trust would lead to better modeling of\nphenomenon in real-world applications.\n
Most of the existing trust modeling approaches based on text information assume monolingual environments.\nHowever, many social network services are used by people from various countries, so that various languages\nsimultaneously appear in tags and comments. In such cases, some text information may be regarded as wrong\ndue to the language difference. Therefore, incorporating the multilingualism in trust modeling would be useful\nto solve this problem.\n
Nowadays, it is observed that interaction across social networks becomes popular. For example, users can use\ntheir Facebook accounts to log in some other social network services. Thus, a future challenge in trust modeling\nis to investigate how trust models across domains can be effectively connected and shared.\n
most of the current techniques for noise and spam reduction focus only on textual\ntag processing and user profile analysis, while audio and visual content features of multimedia content can also\nprovide useful information about the relevance of the content and content-tag relationship (e.g. [22]). In the\nfuture, a promising research direction would be to combine multimedia content analysis with conventional tag\nprocessing and user profile analysis.\n
most of the current techniques for noise and spam reduction focus only on textual\ntag processing and user profile analysis, while audio and visual content features of multimedia content can also\nprovide useful information about the relevance of the content and content-tag relationship (e.g. [22]). In the\nfuture, a promising research direction would be to combine multimedia content analysis with conventional tag\nprocessing and user profile analysis.\n
most of the current techniques for noise and spam reduction focus only on textual\ntag processing and user profile analysis, while audio and visual content features of multimedia content can also\nprovide useful information about the relevance of the content and content-tag relationship (e.g. [22]). In the\nfuture, a promising research direction would be to combine multimedia content analysis with conventional tag\nprocessing and user profile analysis.\n