SlideShare une entreprise Scribd logo
1  sur  27
Building
Recommendation
Engine
 Keeyong Han, Jan 2013
Table of Contents
1. What is Recommendation?
2. Different Recommendation Strategies
3. Introduction of Hadoop/Mahout
4. Building Recommendation Engine with
   Hadoop/Mahout
5. How to use Mahout
6. Q&A
What is
Recommendation?
Definition of
Recommendation Engine
"A recommendation system provides
information or items that are likely to be of
interest to a user, in an automated fashion”
- Alpa Jain from Twitter

"Serve the right item to users in an
automated fashion to optimize long-term
business objectives"
- Deepak Agarwal from Yahoo
Examples
•   Related Product (Amazon)
•   Movie Recommendation (Netflix)
•   News Contents (Yahoo)
•   Online Dating (eHarmony)
•   Search Autocomplete (Google)
•   Connection Recommendation (LinkedIn)
•   Song Recommendation (Pandora)
•   Walmart – (Physical) Store Layout
Why Recommendation?
•   A way for users to find contents of interest
    (from large selections) with less efforts.
     o Natural way to personalization!
     o Serendipity factor

•   For companies, a good way to introduce
    new and unknown contents
Different
Recommendation
Strategies
Item vs. User
Item based
recommendation (1)
1. Content-based Item Recommendation.
  o   Using meta data from Item, compute similarity
      between items.
      i. Description, price, category and so on
      ii. Normalize these into a feature vector (numeric
          values)
             i.   You can think of it as a point in N-dimension.
      iii.   Compute the distances between vectors.
             i.   Euclidean Distance Score
             ii. Cosine Similarity Score
             iii. Pearson Correlation Score
Item based
recommendation (2)
2.       Collaborative Filtering.
     o    Leverage users’ collective intelligence
           Similar users tend to like similar items
           Amazon’s product recommendation is a very
             good and famous example
     o    Will look at this in more detail
User based
recommendation
• First group users into different clusters
    o   Represent users as feature vectors
         Information about users:
            •   geo-location, gender, age, …
          Items users liked or rated
    o   K-nearest neighbors (KNN) is used a lot
•   From each cluster, find representative items
    o   Some kind of graph traversal
    o   Highest rated items
    o   Most liked items
Challenges of
Recommendation Engine
• Cold Starter
    o   For new users and/or items, no information to
        leverage.
•   Sparse Data
    o   Item reviews or purchases are not very common.
•   Scalability Issue
    o   The bigger the data gets, the more computation is
        needed.
Introduction of
Hadoop/Mahout
What is Hadoop?
•   An open source distributed computation and
    storage platform after Google File System
    and MapReduce framework
•   Perfect fit for large scale batch offline
    processing but not for realtime processing
•   Widely used in many companies
What is Mahout?
•   An open source machine learning library
    written in Java.
    1. Standalone
    2. MapReduce.
       o Supports large scale batch offline processing.

•   Covers the followings
    o Recommendation/Collaborative Filtering.
    o Classification: Supervised Learning.
    o Clustering: Unsupervised Learning.
Building
Recommendation Engine
with Hadoop/Mahout
Typical Architecture
                Data Collection                      Web server logs,
                                                     MySQL
                                                     tables, ...

                                                     (explicit
  Input Data Pre-processing (ETL, Filtering, …)      feedback and
                                                     implicit
                                                     feedback)
   Recommendation Data Building (Mahout)


   Output Data Post-processing (Re-ordering)
                                                   Hadoop
       Load Final Data To Serving Layer


                                                  MySQL, NoSQL,
        Recommendation Serving Layer              Solr/ElasticSearch,
                                                  ...
Use Case:
Polyvore – Item Page

              Item in question




                       Content Based
                       Recommendation



                       Collaborative Filtering
Use Case:
Polyvore – Home Page




               Personalized Recommendation
People who liked this
also like ...
• This is based on "Collaborative Filtering”
• Construct co-occurrence matrix or Item
    similarity matrix – S[NxN]
    o   Increment S[i,j] and S[j,i] if item i and item j are liked
        by the same user
    o   Repeat this for all users for their liked items
•   For item k, find the most co-occurred items
    (from column k or row k) as
    recommendations.
Personalized
Recommendation
• This is based on "Collaborative Filtering”
• Extension of previous topic
• Computation-wise, matrix multiplication
   a.   First, build a similar matrix (S) for items
   b.   Next, build a preference vector (P) for user
   c.   Next, multiply two matrices from a and b
         R=SxP
   a.   Lastly, sort the final vector elements of R
Polyvore Example
• Assumption:
    o   N items and M users. Users can only like (no rating)
•   Create item similarity matrix of S (NxN)
    o   This will be used as recommendations in Item page
•   Create user preference vector of P(1xN)
    o   Set all P(i) which are liked by the user in question
•   Multiply S by P
    o   Sort result elements by the score
    o   This will be personalized item recommendation
How to use Mahout?
• ItemSimilarityJob class
    •   Main class to compute co-occurrence matrix.
•   RecommenderJob class
    •   Main class to generate personalized
        recommendations.
hadoop jar mahout-core-0.8-job.jar
   org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
   -Dmapred.input.dir=input/user-item-rating.txt -Dmapred.output.dir=output
   --usersFile input/users.txt --booleanData --similarityClassname
   SIMILARITY_COOCCURRENCE --minPrefsPerUser 2 --maxPrefsPerUser
   50000

This will run total 10 mapreduce jobs to generate final recommendations for
users
How to use Mahout?
(Cont'd)
• Input File: user-item-rating.txt
  o  userID,itemID[,rating] per line.
• How to compute similarity between Items
  o   --similarityClassname parameter determines
        CooccurrenceCountSimilarity
        LogLikelihoodSimilarity
        TanimotoCoefficientSimilarity
        CityBlockSimilarity
        CosineSimilarity
        PearsonCorrelationSimilarity
        EuclideanDistanceSimilarity
How to use Mahout?
(Cont'd)
• Final Output
  o     UserID   [(ItemID,Score),(ItemID,Score),......
    o   ...



•   Load this from HDFS to a serving layer
    o   Relational Database
    o   Search Engine
    o   NoSQL
Lessons
• Need to understand business domain
    o This takes time and efforts

•   Garbage In Garbage Out
    o   Filtering is very important
•   Start with simple approach
    o And then improve gradually

•   Having automated pipeline is very important
    o   More experiments with less efforts is doable
    o   Remember you will have to do lots of experiments
    o   But it is hard and takes time to build
Next stage of
recommendation?
•   Need realtime & scalable
    recommendation technology.
•   Recommendation As A Service.
    •   www.myrrix.com
Q&A
keeyonghan@hotmail.com

Contenu connexe

Tendances

Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahoutlucenerevolution
 
Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用James Chen
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkEvan Casey
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahoutsscdotopen
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013MLconf
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDCDrew Farris
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php applicationMichele Orselli
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 

Tendances (20)

Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahout
 
Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahout
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php application
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 

Similaire à Buidling large scale recommendation engine

Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightChris Price
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In IndustryXavier Amatriain
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivoMarieke Guy
 
Apache Mahout
Apache MahoutApache Mahout
Apache MahoutAjit Koti
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders SystemsTariq Hassan
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxLokeshKumarReddy8
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to usersjobinwilson
 

Similaire à Buidling large scale recommendation engine (20)

Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivo
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders Systems
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to users
 

Buidling large scale recommendation engine

  • 2. Table of Contents 1. What is Recommendation? 2. Different Recommendation Strategies 3. Introduction of Hadoop/Mahout 4. Building Recommendation Engine with Hadoop/Mahout 5. How to use Mahout 6. Q&A
  • 4. Definition of Recommendation Engine "A recommendation system provides information or items that are likely to be of interest to a user, in an automated fashion” - Alpa Jain from Twitter "Serve the right item to users in an automated fashion to optimize long-term business objectives" - Deepak Agarwal from Yahoo
  • 5. Examples • Related Product (Amazon) • Movie Recommendation (Netflix) • News Contents (Yahoo) • Online Dating (eHarmony) • Search Autocomplete (Google) • Connection Recommendation (LinkedIn) • Song Recommendation (Pandora) • Walmart – (Physical) Store Layout
  • 6. Why Recommendation? • A way for users to find contents of interest (from large selections) with less efforts. o Natural way to personalization! o Serendipity factor • For companies, a good way to introduce new and unknown contents
  • 8. Item based recommendation (1) 1. Content-based Item Recommendation. o Using meta data from Item, compute similarity between items. i. Description, price, category and so on ii. Normalize these into a feature vector (numeric values) i. You can think of it as a point in N-dimension. iii. Compute the distances between vectors. i. Euclidean Distance Score ii. Cosine Similarity Score iii. Pearson Correlation Score
  • 9. Item based recommendation (2) 2. Collaborative Filtering. o Leverage users’ collective intelligence  Similar users tend to like similar items  Amazon’s product recommendation is a very good and famous example o Will look at this in more detail
  • 10. User based recommendation • First group users into different clusters o Represent users as feature vectors  Information about users: • geo-location, gender, age, …  Items users liked or rated o K-nearest neighbors (KNN) is used a lot • From each cluster, find representative items o Some kind of graph traversal o Highest rated items o Most liked items
  • 11. Challenges of Recommendation Engine • Cold Starter o For new users and/or items, no information to leverage. • Sparse Data o Item reviews or purchases are not very common. • Scalability Issue o The bigger the data gets, the more computation is needed.
  • 13. What is Hadoop? • An open source distributed computation and storage platform after Google File System and MapReduce framework • Perfect fit for large scale batch offline processing but not for realtime processing • Widely used in many companies
  • 14. What is Mahout? • An open source machine learning library written in Java. 1. Standalone 2. MapReduce. o Supports large scale batch offline processing. • Covers the followings o Recommendation/Collaborative Filtering. o Classification: Supervised Learning. o Clustering: Unsupervised Learning.
  • 16. Typical Architecture Data Collection Web server logs, MySQL tables, ... (explicit Input Data Pre-processing (ETL, Filtering, …) feedback and implicit feedback) Recommendation Data Building (Mahout) Output Data Post-processing (Re-ordering) Hadoop Load Final Data To Serving Layer MySQL, NoSQL, Recommendation Serving Layer Solr/ElasticSearch, ...
  • 17. Use Case: Polyvore – Item Page Item in question Content Based Recommendation Collaborative Filtering
  • 18. Use Case: Polyvore – Home Page Personalized Recommendation
  • 19. People who liked this also like ... • This is based on "Collaborative Filtering” • Construct co-occurrence matrix or Item similarity matrix – S[NxN] o Increment S[i,j] and S[j,i] if item i and item j are liked by the same user o Repeat this for all users for their liked items • For item k, find the most co-occurred items (from column k or row k) as recommendations.
  • 20. Personalized Recommendation • This is based on "Collaborative Filtering” • Extension of previous topic • Computation-wise, matrix multiplication a. First, build a similar matrix (S) for items b. Next, build a preference vector (P) for user c. Next, multiply two matrices from a and b  R=SxP a. Lastly, sort the final vector elements of R
  • 21. Polyvore Example • Assumption: o N items and M users. Users can only like (no rating) • Create item similarity matrix of S (NxN) o This will be used as recommendations in Item page • Create user preference vector of P(1xN) o Set all P(i) which are liked by the user in question • Multiply S by P o Sort result elements by the score o This will be personalized item recommendation
  • 22. How to use Mahout? • ItemSimilarityJob class • Main class to compute co-occurrence matrix. • RecommenderJob class • Main class to generate personalized recommendations. hadoop jar mahout-core-0.8-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/user-item-rating.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData --similarityClassname SIMILARITY_COOCCURRENCE --minPrefsPerUser 2 --maxPrefsPerUser 50000 This will run total 10 mapreduce jobs to generate final recommendations for users
  • 23. How to use Mahout? (Cont'd) • Input File: user-item-rating.txt o userID,itemID[,rating] per line. • How to compute similarity between Items o --similarityClassname parameter determines  CooccurrenceCountSimilarity  LogLikelihoodSimilarity  TanimotoCoefficientSimilarity  CityBlockSimilarity  CosineSimilarity  PearsonCorrelationSimilarity  EuclideanDistanceSimilarity
  • 24. How to use Mahout? (Cont'd) • Final Output o UserID [(ItemID,Score),(ItemID,Score),...... o ... • Load this from HDFS to a serving layer o Relational Database o Search Engine o NoSQL
  • 25. Lessons • Need to understand business domain o This takes time and efforts • Garbage In Garbage Out o Filtering is very important • Start with simple approach o And then improve gradually • Having automated pipeline is very important o More experiments with less efforts is doable o Remember you will have to do lots of experiments o But it is hard and takes time to build
  • 26. Next stage of recommendation? • Need realtime & scalable recommendation technology. • Recommendation As A Service. • www.myrrix.com

Notes de l'éditeur

  1. Netflix: 7 days to 1 day. 30M watches per day.
  2. 똣하지 않는 발견 !
  3. This is effective when you have a lot more users than items.
  4. 2% of users provide feedbacks
  5. Make captions more visible and also Likes button on the far left.
  6. Make captions more visible and also Likes button on the far left.
  7. Access log case: lots of robots access. What would be business case for Polyvore. Where is your traffic coming from? What are user’s intetion? Sizes of users and items. Seasonality