DevoxxFR 2024 Reproducible Builds with Apache Maven
Lecture 5: Social Web Data Analysis (2012)
1. Social Web
Lecture 5
How can we MINE, ANALYSE and VISUALISE
the Social Web? (1)
Marieke van Erp
The Network Institute
VU University Amsterdam
Monday, March 5, 12
2. Why?
• UCG provides an enormous wealth of data
• insights in users’ daily lives
• insights in communities
• insights in trends
Monday, March 5, 12
3. What’s the added value of mining social web
data for the individual?
Monday, March 5, 12
4. To whom it may
concern
• Politicians
• Companies
• Governmental institutions
• You?
Monday, March 5, 12
5. The Age of Big Data
• 25 billion tweets on Twitter in 2010, by 175
million users
• 360 billion pieces of contents on Facebook
in 2010, by 600 million different users
• 35 hours of videos uploaded to YouTube
every minute
• 130 million photos uploaded to flickr per
month
Monday, March 5, 12
6. Questions to Ask
• Who uploads/talks? (age, gender,
nationality, community)
• What are the trending topics?
• What else do these users like?
• Who are the most/least active users?
• etc.
Monday, March 5, 12
7. The Rise of the Data
Scientist
http://radar.oreilly.com/2010/06/what-is-data-science.html
Monday, March 5, 12
8. The Rise of the Data
Scientist
• Data Science enables the creation of data
products
• Data products are applications that acquire
their value from the data, and create more
data as a result.
• Users are in a feedback loop: they constantly
provide information about the products they
use, which gets used in the data product.
Monday, March 5, 12
10. Data Mining 101
Data mining is the exploration and analysis of large quantities of
data in order to discover valid, novel, potentially useful, and
ultimately understandable patterns in data.
(Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s
Salford Systems Data Mining Conf. and Toon Calders’ slides)
http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.j
Monday, March 5, 12
11. Data Mining 101
Databases Statistics
Artificial
Intelligence
Monday, March 5, 12
12. Steps
• Data input & exploration
• Preprocessing
• Data mining algorithms
• Evaluation & Interpretation
Monday, March 5, 12
13. Data Input &
Exploration
• What data do I need to answer question
X?
• What variables are in the data?
• Basic stats of my data?
Monday, March 5, 12
14. Are all likes equal?
Do they all mean the same?
Do people like for the same reason?
The ‘likes’ across the different systems?
Monday, March 5, 12
16. Preprocessing
• Cleanup!
• Choose a suitable data model
• What happens if you integrate data
from multiple sources?
• Reformat your data
Monday, March 5, 12
18. Data mining algorithms
• Classification: Generalising a known
structure & apply to new data
• Association: Finding relationships between
variables
• Clustering: Discovering groups and
structures in data
Monday, March 5, 12
19. How do you know you measured what you wanted to
measure?
Monday, March 5, 12
20. Mining in ‘LikeMiner’
• Filter users by interests
• Construct user graphs
• PageRank on graphs to mine
representativeness
• Result: set of influential users
• Compare page topics to
user interests to find pages
most representative for
topics
Monday, March 5, 12
24. Mining Social Web Data
source: http://kunau.us/wp-content/uploads/
2011/02/Screen-shot-2011-02-09-
at-9.03.46-PM-w600-h900.png
Monday, March 5, 12
25. Single Person
Source: http://infosthetics.com/archives/2011/12/
all_the_information_facebook_knows_about_you.html
See also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg
Monday, March 5, 12
26. Populations
http://www.brandrants.com/brandrants/obama/
Monday, March 5, 12
27. Brand Sentiment via
Twitter
http://flowingdata.com/2011/07/25/brand-sentiment-showdown/
Monday, March 5, 12
28. Assignment 3: Data Analysis
• Analyse an existing social data
analysis report
• Apply same analyses to your
own data
• Write research report
http://www.actmedia.eu/media/img/text_zones/English/small_38421.jpg
Monday, March 5, 12
29. Final Assignment:Your SocWeb App
• Create a Social Web app with
your group
• Use structured data,
relationships between entities,
data analysis, visualisation
• Write individual research report
on one of the main aspects of
your app
Image Source: http://blog.compete.com/wp-content/uploads/2012/03/Like.jpg
Monday, March 5, 12
30. Hands-on Teaser
• Your Facebook Friends’
popularity in a spread sheet
• Locations of your Facebook
Friends
• Tag Cloud of your wall posts
image source: http://www.flickr.com/photos/bionicteaching/1375254387/
Monday, March 5, 12