SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Recommendations and Discovery
      at StumbleUpon
           Sumanth Kolar,
        Director, Engineering

                                @_5K
StumbleUpon’s Mission

Help users find content they did not expect to find
 Be the best way to discover new
and interesting things from across
             the Web.
How StumbleUpon works
1. Register   2. Tell us your interests    3. Start Stumbling and
                                           rating web pages




                                We use your interests and behavior to
                                recommend new content for you!
Discovery is very different from search


Discovery at StumbleUpon                  Search
     Serendipitous                     Intent driven
      One at a time                   List of articles
     Never repeats                    Always repeats
   Constantly adapting                 Fixed results
     Tailored for you                  Impersonal

    There is a ongoing shift from search to discovery
StumbleUpon
StumbleUpon Overview
1      Users            Automated
                                                  URL Index
    Discovery       Crawled


                                             3

            Ingestion
             Pipeline                            Rec Engine
                                       Yes
2
                                    Pass
          Sampling                   ?
What are the key challenges to
  good recommendations?
Pillars of good recommendations
    Understand who the user is and what he is
                 interested in.

       Separate good content from the bad.

   Explore various techniques for matching users
                     to content.

        Learn from your recommendations.
Pillars of good recommendations
    Understand who the user is and what he is
                 interested in.

       Separate good content from the bad.

   Explore various techniques for matching users
                     to content.

        Learn from your recommendations.
User self reports topics of interest
Part of the sign up flow…
User’s Interest Graph




Italian        Food/
                         User
Recipes       Cooking


                  Cars

    Vintage
     Cars
Continually Enhance a User’s Interest Graph




Analyze user’s StumbleUpon history to expand on
interest preferences:
   • Add/remove topics
   • Follow/block particular domains
Continually Enhance a User’s Interest Graph


                    Leverage social network
                    data:
                       • Find friends & people
                         to follow
                       • Find content trending
                         in your social circles
                       • Find additional
                         interests
Continually Enhance a User’s Interest Graph




               Mine internal StumbleUpon
               rating and sharing data to
               suggest other stumblers,
               topics.
Enhanced Interest Graph

                            Friends
                                      News



Italian          Food/                   Trending
                             User
Recipes         Cooking



                     Cars
                                      nasa.gov
          Vintage
           Cars             1x.com
Pillars of good recommendations
    Understand who the user is and what he is
                 interested in.

       Separate good content from the bad.

   Explore various techniques for matching users
                     to content.

        Learn from your recommendations.
Sampling

On average hundreds of URLs are ingested into the
StumbleUpon pipeline every minute.

• Sampling key goals:
   1. Determine which URLs to sample and which to skip
      completely
   2. Examine sampling results to identify good URLs

• URL features used when sampling:
   • Known domain performance(ratings, timespent)
   • Content related features (#images, #ads, url length etc)
   • User features of the discoverer (spammer vs trusted user)
Recommendations at StumbleUpon: Sampling

                                                    Classifier based on
                                                    User Feedback
          Random Forest         Vote   Recommend    (Timespent, Ratings)
                                                   Rating      Timespent

                          Yes                      Good        35sec

                                                   Good        22sec
Webpage
                                                   Bad         15sec
                                 Yes
                          No                                               Yes
                                                   Good        45sec

                                                   Good        14sec
                          Yes
                                                   Good        28sec
Leveraging In-Network Experts


•   Users who thumb-up good content and
    thumb-down bad content
•   For example
    –   Joe DiMaggio – Baseball
    –   Julia Child- Food/Cooking
    –   Da Vinci- Art and Architecture
•   Ratings from Experts are more trustworthy
    and earn more weight.
Non Expert                                    Expert
               P(Thumb Up | Page Quality)                  P(Thumb Up | Page Quality)




Page Quality
                                            Page Quality
                                                                                        Recommendations at StumbleUpon: Experts
Pillars of good recommendations
    Understand who the user is and what he is
                 interested in.

       Separate good content from the bad.

   Explore various techniques for matching users
                     to content.

        Learn from your recommendations.
Challenge: User expectations are different

“I LOVE cars!”          “Me too!”
  -Anonymous Stumbler     -Another Stumbler
Like-Minded Users



•   Find users who like content
    similar to the content you do
•   Signals can be ratings, time
    spent, interests, etc.
•   Use the content they’ve liked
PLSI based like-minded


        Vintage Cars
        Action movies            Astronomy
        Astronomy                Space Exploration
        Robotics
                                 Physics
                                 Classic Movies

       Movies
Cars               Space
                                 Neuroscience
                                 Astronomy
                                 Space Exploration
                       Science   Comedy Movies
Like-Minded Users: Challenges Scaling


   Total Pairwise Similarity Calculations
     = 50K users * 5 million users * 1K features
     = 250 Trillion
   Probabilistic Latent Semantic Index (PLSI)
    based similarity over 500 trillion calculations
   PLSI based similarity framework computes in
    less than an hour
Grow User’s Interest Graph:
              Implicit + Explicit

                           Experts     Friends

              Likeminded
                 Users                           News


                                User
               Food/                             Trending
Italian
Recipes       Cooking


                    Cars                    nasa.gov

          Vintage              1x.com
           Cars
Different methods perform differently for
     different users at different times
100%



75%
                                                    Trending
                                                    Follow
50%                                                 Bias domains
                                                    Experts
                                                    News
25%
                                                    Like-minded


 0%
       User 1   User 2   User 3   User 4   User 5
Recommendation context
Pillars of good recommendations
    Understand who the user is and what he is
                 interested in.

       Separate good content from the bad.

   Explore various techniques for matching users
                     to content.

        Learn from your recommendations.
Two Main Signals from Recommendation


   Rating                       Time Spent




            Both present numerous challenges . . .
Ratings: volume decay

                          Users rate more during
                          their initial experience
            # Ratings




                                  Time

Why is this happening?
Time Spent

                                                      ?
                                    ?
                                                 Images
                            Video        Text
                   Images
           Video
                                                 T5 sec
                            T3 sec      T4 sec
                   T2 sec
          T1 sec

• Ratings are sparse
   • < 10% of recommendations have explicit ratings.
• Using time spent decide whether the stumble was skipped
• Timespent on videos is longer than images.
• Solution: Estimate p(Like | Timespent)
   • Model based on user, content patterns
Challenges: Time spent on different devices


                                                                           Stumble Bar
  Median time spent per stumble



                                  Mobile / Tablets
                                                              Installed plugin




                                            5th percentile time spent per stumble
Pillars of good recommendations
    Understand who the user is and what he is
                 interested in.

       Separate good content from the bad.

   Explore various techniques for matching users
                     to content.

        Learn from your recommendations.
How do we know we are doing a
         good job?
Extensive A/B Testing




AB Tests on metrics such as session
length, retention, rating behavior etc
0
                      2
                              6
                                  8
                                      10
                                                14
                                                                     16




                          4
                                           12
Dec-08
Feb-09
Apr-09
Jun-09
Aug-09
Oct-09
                                                +111% improvement!


Dec-09
Feb-10
Apr-10
Jun-10
Aug-10
Oct-10
Dec-10
  Recent Months
Feb-11
Apr-11
                                                                          Normalized Likes vs Dislikes




Jun-11
Aug-11
Oct-11
Dec-11
Feb-12
Apr-12
                                                                                                         Measurable Improvements In Rec Quality




Jun-12
                                                       R² = 0.736




Aug-12
Many other interesting problems…

•   Dupe detection
•   Anti-spam
•   News
•   Topic classification
•   Metrics, quality analysis
•   Trending
•   Search                      We are HIRING !!!
•   User biases, mood
•   Many more…

Contenu connexe

En vedette

Getting Tactical with LATAM Digital Marketing
Getting Tactical with LATAM Digital MarketingGetting Tactical with LATAM Digital Marketing
Getting Tactical with LATAM Digital MarketingZeph Snapp
 
24 as-3-mensagens-angelicaspps3335
24 as-3-mensagens-angelicaspps333524 as-3-mensagens-angelicaspps3335
24 as-3-mensagens-angelicaspps3335O ÚLTIMO CHAMADO
 
Workshop 1 susy wootton
Workshop 1 susy woottonWorkshop 1 susy wootton
Workshop 1 susy woottonPolicy Lab
 
Teodor Balan School - November 2011
Teodor Balan School - November 2011Teodor Balan School - November 2011
Teodor Balan School - November 2011Liliana Gheorghian
 
06module 16 building-lan
06module 16 building-lan06module 16 building-lan
06module 16 building-lansetioaribowo
 
Kelas tahun 5 rajin 2014 for merge
Kelas tahun 5 rajin  2014   for mergeKelas tahun 5 rajin  2014   for merge
Kelas tahun 5 rajin 2014 for mergeSiti Norwati
 
Se Dio La Reeleccion...
Se Dio La Reeleccion...Se Dio La Reeleccion...
Se Dio La Reeleccion...Muklisito
 
งานโลหะแผ่น5 2
งานโลหะแผ่น5 2งานโลหะแผ่น5 2
งานโลหะแผ่น5 2Pannathat Champakul
 
Revista veja destaca fernando mendes na edição desta semana
Revista veja destaca fernando mendes na edição desta semanaRevista veja destaca fernando mendes na edição desta semana
Revista veja destaca fernando mendes na edição desta semanaEvandro Lira
 
нові надходження червня2015
нові надходження червня2015нові надходження червня2015
нові надходження червня2015Maryna Zaharova
 
Evaluation
EvaluationEvaluation
EvaluationHuntwah
 
Policy lab user centred insight monday 23rd feb
Policy lab user centred insight monday 23rd febPolicy lab user centred insight monday 23rd feb
Policy lab user centred insight monday 23rd febPolicy Lab
 
ไม้ตะกู
ไม้ตะกูไม้ตะกู
ไม้ตะกูchokchai57
 

En vedette (18)

Getting Tactical with LATAM Digital Marketing
Getting Tactical with LATAM Digital MarketingGetting Tactical with LATAM Digital Marketing
Getting Tactical with LATAM Digital Marketing
 
Metodos
MetodosMetodos
Metodos
 
24 as-3-mensagens-angelicaspps3335
24 as-3-mensagens-angelicaspps333524 as-3-mensagens-angelicaspps3335
24 as-3-mensagens-angelicaspps3335
 
Retail Idea
Retail IdeaRetail Idea
Retail Idea
 
Workshop 1 susy wootton
Workshop 1 susy woottonWorkshop 1 susy wootton
Workshop 1 susy wootton
 
Teodor Balan School - November 2011
Teodor Balan School - November 2011Teodor Balan School - November 2011
Teodor Balan School - November 2011
 
Webwriting That Works
Webwriting That WorksWebwriting That Works
Webwriting That Works
 
06module 16 building-lan
06module 16 building-lan06module 16 building-lan
06module 16 building-lan
 
Kelas tahun 5 rajin 2014 for merge
Kelas tahun 5 rajin  2014   for mergeKelas tahun 5 rajin  2014   for merge
Kelas tahun 5 rajin 2014 for merge
 
Se Dio La Reeleccion...
Se Dio La Reeleccion...Se Dio La Reeleccion...
Se Dio La Reeleccion...
 
งานโลหะแผ่น5 2
งานโลหะแผ่น5 2งานโลหะแผ่น5 2
งานโลหะแผ่น5 2
 
Revista veja destaca fernando mendes na edição desta semana
Revista veja destaca fernando mendes na edição desta semanaRevista veja destaca fernando mendes na edição desta semana
Revista veja destaca fernando mendes na edição desta semana
 
Function oveloading
Function oveloadingFunction oveloading
Function oveloading
 
нові надходження червня2015
нові надходження червня2015нові надходження червня2015
нові надходження червня2015
 
Evaluation
EvaluationEvaluation
Evaluation
 
SMG Permaseal Non Lubricated Tapered Plug Valve
SMG Permaseal Non Lubricated Tapered Plug ValveSMG Permaseal Non Lubricated Tapered Plug Valve
SMG Permaseal Non Lubricated Tapered Plug Valve
 
Policy lab user centred insight monday 23rd feb
Policy lab user centred insight monday 23rd febPolicy lab user centred insight monday 23rd feb
Policy lab user centred insight monday 23rd feb
 
ไม้ตะกู
ไม้ตะกูไม้ตะกู
ไม้ตะกู
 

Similaire à Recommendations and Discovery at StumbleUpon

Recommendations and User Understanding at StumbleUpon
Recommendations and User Understandingat StumbleUponRecommendations and User Understandingat StumbleUpon
Recommendations and User Understanding at StumbleUponDebora Donato
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
 
Principles of Usability Testing For Historic Newspapers
Principles of Usability Testing For Historic NewspapersPrinciples of Usability Testing For Historic Newspapers
Principles of Usability Testing For Historic NewspapersEuropeana Newspapers
 
Selfish Accessibility: UXSG 2014
Selfish Accessibility: UXSG 2014Selfish Accessibility: UXSG 2014
Selfish Accessibility: UXSG 2014Adrian Roselli
 
Construindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com PythonConstruindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com PythonMarcel Caraciolo
 
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...Michael Powers
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshellKonstantin Savenkov
 
Storytelling Beyond The Campfire - Why Search Engines & Users Care About Content
Storytelling Beyond The Campfire - Why Search Engines & Users Care About ContentStorytelling Beyond The Campfire - Why Search Engines & Users Care About Content
Storytelling Beyond The Campfire - Why Search Engines & Users Care About ContentRoss Monaghan
 
Social Media: Podcasting, Blogging and Social Networking
Social Media: Podcasting, Blogging and Social NetworkingSocial Media: Podcasting, Blogging and Social Networking
Social Media: Podcasting, Blogging and Social NetworkingDawn Yankeelov
 
User Experience Design Fundamentals - Part 2: Talking with Users
User Experience Design Fundamentals - Part 2: Talking with UsersUser Experience Design Fundamentals - Part 2: Talking with Users
User Experience Design Fundamentals - Part 2: Talking with UsersLaura B
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation SystemsRumman Chowdhury
 
Reduce Product Failures While Boosting Conversion Rates
Reduce Product Failures While Boosting Conversion RatesReduce Product Failures While Boosting Conversion Rates
Reduce Product Failures While Boosting Conversion RatesUserZoom
 
Navigating Your Online Presence in the Multifamily Housing Industry
Navigating Your Online Presence in the Multifamily Housing IndustryNavigating Your Online Presence in the Multifamily Housing Industry
Navigating Your Online Presence in the Multifamily Housing IndustryErica Campbell Byrum
 
Selfish Accessibility: Presented at Google
Selfish Accessibility: Presented at GoogleSelfish Accessibility: Presented at Google
Selfish Accessibility: Presented at GoogleAdrian Roselli
 
12 reasons your site sucks - InvestNI
12 reasons your site sucks - InvestNI12 reasons your site sucks - InvestNI
12 reasons your site sucks - InvestNICraig Sullivan
 
Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...
Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...
Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...Invest Northern Ireland
 
Keyword Research: Beyond the Ordinary by Taylor Pratt
Keyword Research: Beyond the Ordinary by Taylor PrattKeyword Research: Beyond the Ordinary by Taylor Pratt
Keyword Research: Beyond the Ordinary by Taylor PrattTaylor Pratt
 
What your customers REALLY think: Incorporating usability testing into agile
What your customers REALLY think: Incorporating usability testing into agileWhat your customers REALLY think: Incorporating usability testing into agile
What your customers REALLY think: Incorporating usability testing into agilePhil Barrett
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsTomer Gabel
 

Similaire à Recommendations and Discovery at StumbleUpon (20)

Recommendations and User Understanding at StumbleUpon
Recommendations and User Understandingat StumbleUponRecommendations and User Understandingat StumbleUpon
Recommendations and User Understanding at StumbleUpon
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
 
Principles of Usability Testing For Historic Newspapers
Principles of Usability Testing For Historic NewspapersPrinciples of Usability Testing For Historic Newspapers
Principles of Usability Testing For Historic Newspapers
 
Selfish Accessibility: UXSG 2014
Selfish Accessibility: UXSG 2014Selfish Accessibility: UXSG 2014
Selfish Accessibility: UXSG 2014
 
Construindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com PythonConstruindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com Python
 
Questionable publishers: the DOAJ perspective
Questionable publishers: the DOAJ perspectiveQuestionable publishers: the DOAJ perspective
Questionable publishers: the DOAJ perspective
 
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
 
Storytelling Beyond The Campfire - Why Search Engines & Users Care About Content
Storytelling Beyond The Campfire - Why Search Engines & Users Care About ContentStorytelling Beyond The Campfire - Why Search Engines & Users Care About Content
Storytelling Beyond The Campfire - Why Search Engines & Users Care About Content
 
Social Media: Podcasting, Blogging and Social Networking
Social Media: Podcasting, Blogging and Social NetworkingSocial Media: Podcasting, Blogging and Social Networking
Social Media: Podcasting, Blogging and Social Networking
 
User Experience Design Fundamentals - Part 2: Talking with Users
User Experience Design Fundamentals - Part 2: Talking with UsersUser Experience Design Fundamentals - Part 2: Talking with Users
User Experience Design Fundamentals - Part 2: Talking with Users
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Reduce Product Failures While Boosting Conversion Rates
Reduce Product Failures While Boosting Conversion RatesReduce Product Failures While Boosting Conversion Rates
Reduce Product Failures While Boosting Conversion Rates
 
Navigating Your Online Presence in the Multifamily Housing Industry
Navigating Your Online Presence in the Multifamily Housing IndustryNavigating Your Online Presence in the Multifamily Housing Industry
Navigating Your Online Presence in the Multifamily Housing Industry
 
Selfish Accessibility: Presented at Google
Selfish Accessibility: Presented at GoogleSelfish Accessibility: Presented at Google
Selfish Accessibility: Presented at Google
 
12 reasons your site sucks - InvestNI
12 reasons your site sucks - InvestNI12 reasons your site sucks - InvestNI
12 reasons your site sucks - InvestNI
 
Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...
Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...
Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...
 
Keyword Research: Beyond the Ordinary by Taylor Pratt
Keyword Research: Beyond the Ordinary by Taylor PrattKeyword Research: Beyond the Ordinary by Taylor Pratt
Keyword Research: Beyond the Ordinary by Taylor Pratt
 
What your customers REALLY think: Incorporating usability testing into agile
What your customers REALLY think: Incorporating usability testing into agileWhat your customers REALLY think: Incorporating usability testing into agile
What your customers REALLY think: Incorporating usability testing into agile
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
 

Dernier

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 

Dernier (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 

Recommendations and Discovery at StumbleUpon

  • 1. Recommendations and Discovery at StumbleUpon Sumanth Kolar, Director, Engineering @_5K
  • 2. StumbleUpon’s Mission Help users find content they did not expect to find Be the best way to discover new and interesting things from across the Web.
  • 3. How StumbleUpon works 1. Register 2. Tell us your interests 3. Start Stumbling and rating web pages We use your interests and behavior to recommend new content for you!
  • 4. Discovery is very different from search Discovery at StumbleUpon Search Serendipitous Intent driven One at a time List of articles Never repeats Always repeats Constantly adapting Fixed results Tailored for you Impersonal There is a ongoing shift from search to discovery
  • 6. StumbleUpon Overview 1 Users Automated URL Index Discovery Crawled 3 Ingestion Pipeline Rec Engine Yes 2 Pass Sampling ?
  • 7. What are the key challenges to good recommendations?
  • 8. Pillars of good recommendations Understand who the user is and what he is interested in. Separate good content from the bad. Explore various techniques for matching users to content. Learn from your recommendations.
  • 9. Pillars of good recommendations Understand who the user is and what he is interested in. Separate good content from the bad. Explore various techniques for matching users to content. Learn from your recommendations.
  • 10. User self reports topics of interest Part of the sign up flow…
  • 11. User’s Interest Graph Italian Food/ User Recipes Cooking Cars Vintage Cars
  • 12. Continually Enhance a User’s Interest Graph Analyze user’s StumbleUpon history to expand on interest preferences: • Add/remove topics • Follow/block particular domains
  • 13. Continually Enhance a User’s Interest Graph Leverage social network data: • Find friends & people to follow • Find content trending in your social circles • Find additional interests
  • 14. Continually Enhance a User’s Interest Graph Mine internal StumbleUpon rating and sharing data to suggest other stumblers, topics.
  • 15. Enhanced Interest Graph Friends News Italian Food/ Trending User Recipes Cooking Cars nasa.gov Vintage Cars 1x.com
  • 16. Pillars of good recommendations Understand who the user is and what he is interested in. Separate good content from the bad. Explore various techniques for matching users to content. Learn from your recommendations.
  • 17. Sampling On average hundreds of URLs are ingested into the StumbleUpon pipeline every minute. • Sampling key goals: 1. Determine which URLs to sample and which to skip completely 2. Examine sampling results to identify good URLs • URL features used when sampling: • Known domain performance(ratings, timespent) • Content related features (#images, #ads, url length etc) • User features of the discoverer (spammer vs trusted user)
  • 18. Recommendations at StumbleUpon: Sampling Classifier based on User Feedback Random Forest Vote Recommend (Timespent, Ratings) Rating Timespent Yes Good 35sec Good 22sec Webpage Bad 15sec Yes No Yes Good 45sec Good 14sec Yes Good 28sec
  • 19. Leveraging In-Network Experts • Users who thumb-up good content and thumb-down bad content • For example – Joe DiMaggio – Baseball – Julia Child- Food/Cooking – Da Vinci- Art and Architecture • Ratings from Experts are more trustworthy and earn more weight.
  • 20. Non Expert Expert P(Thumb Up | Page Quality) P(Thumb Up | Page Quality) Page Quality Page Quality Recommendations at StumbleUpon: Experts
  • 21. Pillars of good recommendations Understand who the user is and what he is interested in. Separate good content from the bad. Explore various techniques for matching users to content. Learn from your recommendations.
  • 22. Challenge: User expectations are different “I LOVE cars!” “Me too!” -Anonymous Stumbler -Another Stumbler
  • 23. Like-Minded Users • Find users who like content similar to the content you do • Signals can be ratings, time spent, interests, etc. • Use the content they’ve liked
  • 24. PLSI based like-minded Vintage Cars Action movies Astronomy Astronomy Space Exploration Robotics Physics Classic Movies Movies Cars Space Neuroscience Astronomy Space Exploration Science Comedy Movies
  • 25. Like-Minded Users: Challenges Scaling  Total Pairwise Similarity Calculations = 50K users * 5 million users * 1K features = 250 Trillion  Probabilistic Latent Semantic Index (PLSI) based similarity over 500 trillion calculations  PLSI based similarity framework computes in less than an hour
  • 26. Grow User’s Interest Graph: Implicit + Explicit Experts Friends Likeminded Users News User Food/ Trending Italian Recipes Cooking Cars nasa.gov Vintage 1x.com Cars
  • 27. Different methods perform differently for different users at different times 100% 75% Trending Follow 50% Bias domains Experts News 25% Like-minded 0% User 1 User 2 User 3 User 4 User 5
  • 29. Pillars of good recommendations Understand who the user is and what he is interested in. Separate good content from the bad. Explore various techniques for matching users to content. Learn from your recommendations.
  • 30. Two Main Signals from Recommendation Rating Time Spent Both present numerous challenges . . .
  • 31. Ratings: volume decay Users rate more during their initial experience # Ratings Time Why is this happening?
  • 32. Time Spent ? ? Images Video Text Images Video T5 sec T3 sec T4 sec T2 sec T1 sec • Ratings are sparse • < 10% of recommendations have explicit ratings. • Using time spent decide whether the stumble was skipped • Timespent on videos is longer than images. • Solution: Estimate p(Like | Timespent) • Model based on user, content patterns
  • 33. Challenges: Time spent on different devices Stumble Bar Median time spent per stumble Mobile / Tablets Installed plugin 5th percentile time spent per stumble
  • 34. Pillars of good recommendations Understand who the user is and what he is interested in. Separate good content from the bad. Explore various techniques for matching users to content. Learn from your recommendations.
  • 35. How do we know we are doing a good job?
  • 36. Extensive A/B Testing AB Tests on metrics such as session length, retention, rating behavior etc
  • 37. 0 2 6 8 10 14 16 4 12 Dec-08 Feb-09 Apr-09 Jun-09 Aug-09 Oct-09 +111% improvement! Dec-09 Feb-10 Apr-10 Jun-10 Aug-10 Oct-10 Dec-10 Recent Months Feb-11 Apr-11 Normalized Likes vs Dislikes Jun-11 Aug-11 Oct-11 Dec-11 Feb-12 Apr-12 Measurable Improvements In Rec Quality Jun-12 R² = 0.736 Aug-12
  • 38. Many other interesting problems… • Dupe detection • Anti-spam • News • Topic classification • Metrics, quality analysis • Trending • Search We are HIRING !!! • User biases, mood • Many more…

Notes de l'éditeur

  1. At the end of this talk, you would have a good understanding of problems with discovery, some solutions, some data insights.
  2. Our goal is to show content that you did not know you would likeTo surprise you, enlighten youBasically to enable exploration, discovery
  3. -During signup, we ask interesting questions to learn more about you – solve the cold start problem
  4. - Think of discovery as search without a term and add the complexities i.e, nothing repeats etcFor example, if you want to learn about astronomy or genetic algorithms its hard to do on search or any other services --- way more work
  5. When I started a couple of years back, we were 6M users and 15 employeesGrowing rapidly, especially on mobileTalk about time spent and how users are super hooked.
  6. Users are good at choosing topics that they like.. We have had repeated good success at increasing the topics they pickBut, the problem is more about having them pick the right topics for them.. Arts vs AI.. Its not simple to build a user experience that accounts for that and gets us that dataHuge area of research for StumbleUpon --- how do we get as much as possible from the user without losing them or setting completely different expectations than what the product is
  7. Now we have a basic version of the interests graph.. Some topics you like
  8. StumbleSenseBased on you likes/dislikes we build a SENSE for other things you may like. Hence suggest topics, domains, etc that we think you will like as you stumble alongMakes interest elicitation a part of the core productYou are learning about the user and the user understands the product a lot.. Dialgoue and back, forthNotice that we give the reason why it was recommended.. Transparency is very important.
  9. Leverage other networks you are part of to get data about what you like and jumpstart interest graph
  10. Also, show suggested stumblers, interests etc
  11. More dense interest graph. Affinity, confidence to the interest varies and depending on that we can exploit, explore.
  12. When new content is discovered /ingested how do we determine if its good or not.You will always have exceptions that need to be handled. For ex: - Domains such as youtube, basically UGC in which content is diverse.. You need to build models that account for thatUser features of the raters/discoverers .. Just because a spammer rated cnn.com you can’t ignore it.. Look at multiple sources of information and decide whether the url is worth sampling or not
  13. Now, one way of doing this is to use a random forest with content features
  14. And also we can sample to expertsThat’s one huge advantage SU has – the fact that we can decide which site to send and get data for that url.But, sometimes you could be recommending bad content to the expert – you get around by telling the expert that we think he is an expert and we need to get more data from him about the url. Again transparency for the win Transparency allows us to set the right expectations..
  15. One way of defining experts is users who thumbup high quality pages and thumbdown low quality pages.. There are multiple ways you can find high quality pages-- Have a seed of experts pick urls and use them to find other experts-- Or looks at your current quality scores and see which user ratings are more predictive of that .. Use them as experts-- Social endorsement.. Have users rate others as experts, use external data sources similar to what klout is doing to do this – very hard problem.
  16. How to you match right content to right user ? User expectations are very different. When you say you like cars and I like cars.. We are not talking about the same thingNeed to deeper understand the interest graph
  17. One solution is find other users that are similar to you.. But then just because you are similar to me in Physics.. does not mean I would like the Music you listen to.
  18. One solution.. Figure out latent topicsand then use them to cluster/find similar users
  19. Now we have an interest graph that is both explicit and implicit
  20. Different users have varying method mix.We learn the mix and balance it.. But this needs to account for mood – for example, we see that you like stumbling news in the morning and videos in the weekend. But there are always exceptions
  21. Context i.e, showing why a recommendation was shown to a user is very important. There should be a back and forth. Recommendations should be very transparent. Context can that your friend on Facebook liked it or it can be that this is trending in Politics
  22. Immediate conclusion is quality of recommendations is not good.. But this is both thumbups and downs Stumbling is cheap and so clicking the stumble button is better than rating. One could argue that we are doing a really good job and the marginal utility of rating is not highSolutions: Use other data such as time spent to figure out what you like. Make you rate more ;) work very closely with product on what we can do to remind the user that their ratings matter
  23. Now, we know we need to use timespent..Last stumble, time spent Great we have a solution
  24. Mobile – shorter attention spans,