SlideShare a Scribd company logo
1 of 30
Download to read offline
Social Web
                                      Lecture 5
                      How can we MINE, ANALYSE and VISUALISE
                                 the Social Web? (1)

                                  Marieke van Erp
                                  The Network Institute
                                 VU University Amsterdam




Monday, March 5, 12
Why?

                      • UCG provides an enormous wealth of data
                         • insights in users’ daily lives
                         • insights in communities
                         • insights in trends

Monday, March 5, 12
What’s the added value of mining social web
                                data for the individual?
Monday, March 5, 12
To whom it may
                              concern
      •      Politicians

      •      Companies

      •      Governmental institutions

      •      You?




Monday, March 5, 12
The Age of Big Data
                      • 25 billion tweets on Twitter in 2010, by 175
                        million users
                      • 360 billion pieces of contents on Facebook
                        in 2010, by 600 million different users
                      • 35 hours of videos uploaded to YouTube
                        every minute
                      • 130 million photos uploaded to flickr per
                        month


Monday, March 5, 12
Questions to Ask
                      • Who uploads/talks? (age, gender,
                        nationality, community)
                      • What are the trending topics?
                      • What else do these users like?
                      • Who are the most/least active users?
                      • etc.
Monday, March 5, 12
The Rise of the Data
                           Scientist




                                http://radar.oreilly.com/2010/06/what-is-data-science.html
Monday, March 5, 12
The Rise of the Data
                            Scientist
                      • Data Science enables the creation of data
                        products
                      • Data products are applications that acquire
                        their value from the data, and create more
                        data as a result.
                      • Users are in a feedback loop: they constantly
                        provide information about the products they
                        use, which gets used in the data product.
Monday, March 5, 12
Popular Data Products




Monday, March 5, 12
Data Mining 101


               Data mining is the exploration and analysis of large quantities of
               data in order to discover valid, novel, potentially useful, and
               ultimately understandable patterns in data.




  (Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s
 Salford Systems Data Mining Conf. and Toon Calders’ slides)
                                                       http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.j
Monday, March 5, 12
Data Mining 101

                      Databases         Statistics



                               Artificial
                             Intelligence


Monday, March 5, 12
Steps

                      • Data input & exploration
                      • Preprocessing
                      • Data mining algorithms
                      • Evaluation & Interpretation

Monday, March 5, 12
Data Input &
                               Exploration

                      • What data do I need to answer question
                        X?
                      • What variables are in the data?
                      • Basic stats of my data?


Monday, March 5, 12
Are all likes equal?
                             Do they all mean the same?
                       Do people like for the same reason?
                      The ‘likes’ across the different systems?
Monday, March 5, 12
Input & Exploration in
                            ‘LikeMiner’




Monday, March 5, 12
Preprocessing

                      • Cleanup!
                      • Choose a suitable data model
                           • What happens if you integrate data
                              from multiple sources?
                      • Reformat your data

Monday, March 5, 12
Preprocessing in
                        ‘LikeMiner’




Monday, March 5, 12
Data mining algorithms

                      • Classification: Generalising a known
                        structure & apply to new data
                      • Association: Finding relationships between
                        variables
                      • Clustering: Discovering groups and
                        structures in data



Monday, March 5, 12
How do you know you measured what you wanted to
                           measure?

Monday, March 5, 12
Mining in ‘LikeMiner’
                      •   Filter users by interests

                      •   Construct user graphs

                      •   PageRank on graphs to mine
                          representativeness

                      •   Result: set of influential users

                      •   Compare page topics to
                          user interests to find pages
                          most representative for
                          topics




Monday, March 5, 12
Interpreting your
                           results




Monday, March 5, 12
Data Mining is not easy




Monday, March 5, 12
Monday, March 5, 12
Mining Social Web Data




                         source: http://kunau.us/wp-content/uploads/
                             2011/02/Screen-shot-2011-02-09-
                               at-9.03.46-PM-w600-h900.png
Monday, March 5, 12
Single Person




                                            Source: http://infosthetics.com/archives/2011/12/
                                        all_the_information_facebook_knows_about_you.html
                       See also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg
Monday, March 5, 12
Populations




                         http://www.brandrants.com/brandrants/obama/
Monday, March 5, 12
Brand Sentiment via
                            Twitter




                http://flowingdata.com/2011/07/25/brand-sentiment-showdown/
Monday, March 5, 12
Assignment 3: Data Analysis


             •        Analyse an existing social data
                      analysis report
             •        Apply same analyses to your
                      own data
             •        Write research report


                                        http://www.actmedia.eu/media/img/text_zones/English/small_38421.jpg
Monday, March 5, 12
Final Assignment:Your SocWeb App

             •        Create a Social Web app with
                      your group
             •        Use structured data,
                      relationships between entities,
                      data analysis, visualisation
             •        Write individual research report
                      on one of the main aspects of
                      your app
                                             Image Source: http://blog.compete.com/wp-content/uploads/2012/03/Like.jpg
Monday, March 5, 12
Hands-on Teaser

             •        Your Facebook Friends’
                      popularity in a spread sheet
             •        Locations of your Facebook
                      Friends
             •        Tag Cloud of your wall posts


                                                     image source: http://www.flickr.com/photos/bionicteaching/1375254387/

Monday, March 5, 12

More Related Content

What's hot

Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionAzzurra Ragone
 
Government Linked Data Projects in the Wild
Government Linked Data Projects in the WildGovernment Linked Data Projects in the Wild
Government Linked Data Projects in the WildBernadette Hyland-Wood
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupJames Hendler
 
Rapid Semantic Web Application Development
Rapid Semantic Web Application DevelopmentRapid Semantic Web Application Development
Rapid Semantic Web Application DevelopmentBernadette Hyland-Wood
 
Data Kindness on the Internet
Data Kindness on the InternetData Kindness on the Internet
Data Kindness on the InternetChristan Grant
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platformsYannis Charalabidis
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebJames Hendler
 
Linked Data Approach for Integration of Human Health & Environmental Data
Linked Data Approach for Integration of Human Health & Environmental DataLinked Data Approach for Integration of Human Health & Environmental Data
Linked Data Approach for Integration of Human Health & Environmental Data3 Round Stones
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopIan Hopkinson
 
Altmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the ImplicationsAltmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the ImplicationsCMHSL
 
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008Darlene Fichter
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 

What's hot (20)

Broad Data
Broad DataBroad Data
Broad Data
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
 
Government Linked Data Projects in the Wild
Government Linked Data Projects in the WildGovernment Linked Data Projects in the Wild
Government Linked Data Projects in the Wild
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
Rapid Semantic Web Application Development
Rapid Semantic Web Application DevelopmentRapid Semantic Web Application Development
Rapid Semantic Web Application Development
 
Data Kindness on the Internet
Data Kindness on the InternetData Kindness on the Internet
Data Kindness on the Internet
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platforms
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Linked Data Approach for Integration of Human Health & Environmental Data
Linked Data Approach for Integration of Human Health & Environmental DataLinked Data Approach for Integration of Human Health & Environmental Data
Linked Data Approach for Integration of Human Health & Environmental Data
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists Workshop
 
Altmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the ImplicationsAltmetrics: The Movement, The Tools, and the Implications
Altmetrics: The Movement, The Tools, and the Implications
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 

Viewers also liked

Humans & Machines Together
Humans & Machines TogetherHumans & Machines Together
Humans & Machines TogetherLora Aroyo
 
NoTube: Recommendations @ Korea Telecom
NoTube: Recommendations @ Korea TelecomNoTube: Recommendations @ Korea Telecom
NoTube: Recommendations @ Korea TelecomLora Aroyo
 
Digital Humanities Minor Program
Digital Humanities Minor ProgramDigital Humanities Minor Program
Digital Humanities Minor ProgramLora Aroyo
 
Patterns for Personalization on the Web
Patterns for Personalization on the WebPatterns for Personalization on the Web
Patterns for Personalization on the WebLora Aroyo
 
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...Lora Aroyo
 
EuroITV2010: Linking TV and (Social) Web: NoTube Use Case
EuroITV2010: Linking TV and (Social) Web: NoTube Use CaseEuroITV2010: Linking TV and (Social) Web: NoTube Use Case
EuroITV2010: Linking TV and (Social) Web: NoTube Use CaseLora Aroyo
 

Viewers also liked (6)

Humans & Machines Together
Humans & Machines TogetherHumans & Machines Together
Humans & Machines Together
 
NoTube: Recommendations @ Korea Telecom
NoTube: Recommendations @ Korea TelecomNoTube: Recommendations @ Korea Telecom
NoTube: Recommendations @ Korea Telecom
 
Digital Humanities Minor Program
Digital Humanities Minor ProgramDigital Humanities Minor Program
Digital Humanities Minor Program
 
Patterns for Personalization on the Web
Patterns for Personalization on the WebPatterns for Personalization on the Web
Patterns for Personalization on the Web
 
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
 
EuroITV2010: Linking TV and (Social) Web: NoTube Use Case
EuroITV2010: Linking TV and (Social) Web: NoTube Use CaseEuroITV2010: Linking TV and (Social) Web: NoTube Use Case
EuroITV2010: Linking TV and (Social) Web: NoTube Use Case
 

Similar to Lecture 5: Social Web Data Analysis (2012)

Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lora Aroyo
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6ARDC
 
Data Literacy and Visualization
Data Literacy and VisualizationData Literacy and Visualization
Data Literacy and VisualizationNicoleBranch
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data ManagementUW Research Data Services
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Librariesrobin fay
 
Social Media World presentation
Social Media World presentationSocial Media World presentation
Social Media World presentationkperi
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationSmart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationDATAVERSITY
 
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterImplementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterDATAVERSITY
 

Similar to Lecture 5: Social Web Data Analysis (2012) (20)

Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6
 
Data Literacy and Visualization
Data Literacy and VisualizationData Literacy and Visualization
Data Literacy and Visualization
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data Management
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Libraries
 
Social Media World presentation
Social Media World presentationSocial Media World presentation
Social Media World presentation
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationSmart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
 
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterImplementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
 
2011 ATE Conference PreConference Workshop C
2011 ATE Conference PreConference Workshop C2011 ATE Conference PreConference Workshop C
2011 ATE Conference PreConference Workshop C
 

More from Lora Aroyo

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfLora Aroyo
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningLora Aroyo
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Lora Aroyo
 
Data excellence: Better data for better AI
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AILora Aroyo
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumLora Aroyo
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorLora Aroyo
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumLora Aroyo
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsLora Aroyo
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesLora Aroyo
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the LoopLora Aroyo
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoLora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Lora Aroyo
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to SnapchatLora Aroyo
 

More from Lora Aroyo (20)

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)
 
Data excellence: Better data for better AI
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AI
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH Symposium
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP Demonstrator
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Lecture 5: Social Web Data Analysis (2012)

  • 1. Social Web Lecture 5 How can we MINE, ANALYSE and VISUALISE the Social Web? (1) Marieke van Erp The Network Institute VU University Amsterdam Monday, March 5, 12
  • 2. Why? • UCG provides an enormous wealth of data • insights in users’ daily lives • insights in communities • insights in trends Monday, March 5, 12
  • 3. What’s the added value of mining social web data for the individual? Monday, March 5, 12
  • 4. To whom it may concern • Politicians • Companies • Governmental institutions • You? Monday, March 5, 12
  • 5. The Age of Big Data • 25 billion tweets on Twitter in 2010, by 175 million users • 360 billion pieces of contents on Facebook in 2010, by 600 million different users • 35 hours of videos uploaded to YouTube every minute • 130 million photos uploaded to flickr per month Monday, March 5, 12
  • 6. Questions to Ask • Who uploads/talks? (age, gender, nationality, community) • What are the trending topics? • What else do these users like? • Who are the most/least active users? • etc. Monday, March 5, 12
  • 7. The Rise of the Data Scientist http://radar.oreilly.com/2010/06/what-is-data-science.html Monday, March 5, 12
  • 8. The Rise of the Data Scientist • Data Science enables the creation of data products • Data products are applications that acquire their value from the data, and create more data as a result. • Users are in a feedback loop: they constantly provide information about the products they use, which gets used in the data product. Monday, March 5, 12
  • 10. Data Mining 101 Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. (Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s Salford Systems Data Mining Conf. and Toon Calders’ slides) http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.j Monday, March 5, 12
  • 11. Data Mining 101 Databases Statistics Artificial Intelligence Monday, March 5, 12
  • 12. Steps • Data input & exploration • Preprocessing • Data mining algorithms • Evaluation & Interpretation Monday, March 5, 12
  • 13. Data Input & Exploration • What data do I need to answer question X? • What variables are in the data? • Basic stats of my data? Monday, March 5, 12
  • 14. Are all likes equal? Do they all mean the same? Do people like for the same reason? The ‘likes’ across the different systems? Monday, March 5, 12
  • 15. Input & Exploration in ‘LikeMiner’ Monday, March 5, 12
  • 16. Preprocessing • Cleanup! • Choose a suitable data model • What happens if you integrate data from multiple sources? • Reformat your data Monday, March 5, 12
  • 17. Preprocessing in ‘LikeMiner’ Monday, March 5, 12
  • 18. Data mining algorithms • Classification: Generalising a known structure & apply to new data • Association: Finding relationships between variables • Clustering: Discovering groups and structures in data Monday, March 5, 12
  • 19. How do you know you measured what you wanted to measure? Monday, March 5, 12
  • 20. Mining in ‘LikeMiner’ • Filter users by interests • Construct user graphs • PageRank on graphs to mine representativeness • Result: set of influential users • Compare page topics to user interests to find pages most representative for topics Monday, March 5, 12
  • 21. Interpreting your results Monday, March 5, 12
  • 22. Data Mining is not easy Monday, March 5, 12
  • 24. Mining Social Web Data source: http://kunau.us/wp-content/uploads/ 2011/02/Screen-shot-2011-02-09- at-9.03.46-PM-w600-h900.png Monday, March 5, 12
  • 25. Single Person Source: http://infosthetics.com/archives/2011/12/ all_the_information_facebook_knows_about_you.html See also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg Monday, March 5, 12
  • 26. Populations http://www.brandrants.com/brandrants/obama/ Monday, March 5, 12
  • 27. Brand Sentiment via Twitter http://flowingdata.com/2011/07/25/brand-sentiment-showdown/ Monday, March 5, 12
  • 28. Assignment 3: Data Analysis • Analyse an existing social data analysis report • Apply same analyses to your own data • Write research report http://www.actmedia.eu/media/img/text_zones/English/small_38421.jpg Monday, March 5, 12
  • 29. Final Assignment:Your SocWeb App • Create a Social Web app with your group • Use structured data, relationships between entities, data analysis, visualisation • Write individual research report on one of the main aspects of your app Image Source: http://blog.compete.com/wp-content/uploads/2012/03/Like.jpg Monday, March 5, 12
  • 30. Hands-on Teaser • Your Facebook Friends’ popularity in a spread sheet • Locations of your Facebook Friends • Tag Cloud of your wall posts image source: http://www.flickr.com/photos/bionicteaching/1375254387/ Monday, March 5, 12