SlideShare a Scribd company logo
1 of 40
BUILDING A
PREDICTIVE MODEL
AN EXAMPLE OF A PRODUCT
RECOMMENDATION ENGINE

Alex Lin
Senior Architect
Intelligent Mining
alin@intelligentmining.com
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            2
The Business Problem
  Design
        product recommender solution that will
 increase revenue.




               $$
                                            3
How Do We Increase Revenue?



               Increase
              Conversion
  Increase
  Revenue                      Increase
                              Unit Price
             Increase Avg.
              Order Value
                               Increase
                             Units / Order




                                             4
Example
  Is   this recommendation effective?


                        Increase
                       Unit Price



                        Increase
                      Units / Order




                                         5
What am I
going to do?




               6
Predictive Model
   Framework



                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



What data?      What feature?   Which Algorithm ?   Cross-sell & Up-sell
                                                    Recommendation




                                                                     7
What Data to Use?
  Explicit   data
       Ratings
       Comments
  Implicit   data
       Order history / Return history
       Cart events
       Page views
       Click-thru
       Search log
  In
    today’s talk we only use Order history and Cart
  events
                                                      8
Predictive Model


                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



Order History   What feature?   Which Algorithm ?   Cross-sell & Up-sell
Cart Events                                         Recommendation




                                                                     9
What Features to Use?
  We know that a given product tends to get
   purchased by customers with similar tastes or
   needs.
  Use user engagement data to describe a product.


                                  users
                1   2   3     4   5   6     7   8   9   10    …   n
    item




           17   1       .25           .25       1       .25


                         user engagement vector



                                                                      10
Data Representation / Features
  When we merge every item’s user engagement
 vector, we got a m x n item-user matrix
                                        users
              1   2     3     4     5   6   7     8     9   10    …   n

          1   1         .25             1                   .25

          2                                 .25
  items




          3   1               .25                 1
          4       .25               1             .25   1

                        1                   1
          …




          m



                                                                          11
Data Normalization
  Ensurethe magnitudes of the entries in the
 dataset matrix are appropriate
                                             users
               1     2     3     4     5     6     7     8     9     10    …   n

           1   1
               .5          1
                           .9                 1
                                             .92                      1
                                                                     .49

           2                                        1
                                                   .79
   items




           3    1
               .67                1
                                 .46                      1
                                                         .73

           4          1
                     .39                1
                                       .82                1
                                                         .76    1
                                                               .69

                            1                      1
           …
           …




                           .52                     .8

           m


  Removecolumn average – so frequent buyers
                                                                                   12
 don’t dominate the model
Data Normalization
  Differentengagement data points (Order / Cart /
   Page View) should have different weights
  Common normalization strategies:
      Remove column average
      Remove row average
      Remove global mean
      Z-score
      Fill-in the null values




                                                     13
Predictive Model


                                         ML                  Prediction
     Data          Features
                                      Algorithm               Output



Order History   User engagement      Which Algorithm ?   Cross-sell & Up-sell
Cart Events     vector                                   Recommendation



                Data Normalization




                                                                        14
Which Algorithm?
  How
     do we find the items that have similar user
 engagement data?
                                       users
               1   2   3     4   5      6   7   8     9   10    …   n

          1    1       .25              1                 1
          2                                 1
  items




          17   1             1          1       .25       .25

          18       1             .25    1       1     1

                                            1
          …




                       .25

          m


  We
    can find the items that have similar user
                                                                        15
 engagement vectors with kNN algorithm
k-Nearest Neighbor (kNN)
  Find
      the k items that have the most similar user
  engagement vectors
                                      users
              1     2   3    4   5     6   7    8   9    10   …   n

          1   .5        1              1                 1

          2         1                      .5            1
  items




          3   1              1                  1   1

          4         1            .5        1        1

                        .5                 1
          …




          m                  1                      .5



  Nearest         Neighbors of Item 4 = [2,3,1]                      16
Similarity Measure for kNN
                                                      users
                   1    2     3        4        5          6            7         8       9   10   …    n
       items




               2        1                                           .5                        1

               4        1                       .5                      1                 1

    Jaccard coefficient:
                                                                   (1+ 1)
                             sim(a,b) =
                                                     (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)
    Cosine similarity:
                                  a•b                                                         (1*1+ 0.5 *1)
           sim(a,b) = cos(a,b) =                                    =
               €                 a ∗ b                                          (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 )
                                  2                            2

    Pearson Correlation:

€         corr(a,b) =
                             ∑ (r − r )(r − r )
                                  i    ai       a     bi            b
                                                                                      =
                                                                                                   m∑ aibi − ∑ ai ∑ bi

                            ∑ (r − r ) ∑ (r − r )
                              i   ai        a
                                                2
                                                       i       bi           b
                                                                                 2
                                                                                          m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2
                                                                                                                                  17
                             match _ cols * Dotprod(a,b) − sum(a) * sum(b)
          =
               match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
k-Nearest Neighbor (kNN)
                                                       Item
feature
  space                                                Similarity Measure
                                                       (cosine similarity)
                                           7
                          9


              8                   2
          1
                                      4
                      6


                                               5

                  3




                              kNN k=5
                              Nearest Neighbors(8) = [9,6,3,1,2]       18
Predictive Model
   Ver.    1: kNN

                                              ML                   Prediction
     Data               Features
                                           Algorithm                Output



Order History        User engagement      k-Nearest Neighbor   Cross-sell & Up-sell
Cart Events          vector               (kNN)                Recommendation



                     Data Normalization




                                                                              19
Cosine Similarity – Code fragment
 long i_cnt = 100000; // number of items 100K
 long u_cnt = 2000000; // number of users 2M
 double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation)
 double norm[i_cnt];

 // assume data matrix is loaded
 ……
 // calculate vector norm for each user engagement vector
 for (i=0; i<i_cnt; i++) {
     norm[i] = 0;
     for (f=0; f<u_cnt; f++) {
        norm[i] += data[i][f] * data [i][f];
     }                               1. 100K rows x 100K rows x 2M features --> scalability problem
     norm[i] = sqrt(norm[i]);              kd-tree, Locality sensitive hashing,
 }
                                        MapReduce/Hadoop, Multicore/Threading, Stream Processors
 // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures
 for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem
                                        are
  for (j=0; j<i_cnt; j++) { // loop thru 100K
     dot_product = 0;
                                         This leads us to The SVD dimensionality reduction !
     for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M
         dot_product += data[i][f] * data[j][f];
     }
     printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j]));
   }                                                                                                      20
 // find the Top K nearest neighbors here
 …….
Singular Value Decomposition
            (SVD)
            A = U × S ×VT
                  A                             U                     S                     VT
             m x n matrix                   m x r matrix         r x r matrix           r x n matrix




€
    items




                                items



                                                                rank = k
                                                                k<r


               users                                                                      users
                                                                                            users
             Ak = U k × Sk × VkT
           Low rank approx. Item profile is        U k * Sk


                                                                                items
           Low rank approx. User profile is         S k *VkT                                          21

€          Low rank approx. Item-User matrix is
                                 €                         U k * Sk * Sk *VkT

                                        €
Reduced SVD
        Ak = U k × Sk × VkT
               Ak                          Uk                Sk                VkT
         100K x 2M matrix         100K x 3 matrix      3 x 3 matrix        3 x 2M matrix

                                                         7   0    0

                                                         0   3    0
items




                                   items
                                                         0   0    1              users
                                                         rank = 3
                                                                      Descending
                                                                      Singular Values


             users

       Low rank approx. Item profile is    U k * Sk

                                                                                           22

                              €
SVD Factor Interpretation                                  S
                                                     3 x 3 matrix

  Singular        values plot (rank=512)              7   0   0

                                                       0   3   0

                                                       0   0   1

                                                                    Descending
                                                                    Singular Values




                                                                                23
More Significant        Latent Factors   Noises + Others             Less Significant
SVD Dimensionality Reduction

                                  U k * Sk
        <----- latent factors ----->                     # of users


                        €
items




         3
               rank
                            Need to find the most optimal low rank !!
                10


                                                                        24
Missing values

    Difference between “0” and “unknown”
    Missing values do NOT appear randomly.
    Value = (Preference Factors) + (Availability) – (Purchased
     elsewhere) – (Navigation inefficiency) – etc.
    Approx. Value = (Preference Factors) +/- (Noise)
    Modeling missing values correctly will help us make good
     recommendations, especially when working with an extremely
     sparse data set




                                                                  25
Singular Value Decomposition
(SVD)
  Use SVD to reduce dimensionality, so neighborhood
   formation happens in reduced user space
  SVD helps model to find the low rank approx. dataset
   matrix, while retaining the critical latent factors and
   ignoring noise.
  Optimal low rank needs to be tuned
  SVD is computationally expensive


    SVD Libraries:
         Matlab [U, S, V] = svds(A,256);
         SVDPACKC http://www.netlib.org/svdpack/
         SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/
         GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html   26
Predictive Model
   Ver.    2: SVD+kNN

                                           ML                   Prediction
     Data           Features
                                        Algorithm                Output



Order History     User engagement     k-Nearest Neighbors   Cross-sell & Up-sell
Cart Events            vector         (kNN) in reduced      Recommendation
                                      space


                 Data Normalization

                       SVD


                                                                           27
Synthetic Data Set
  Why   do we use synthetic data set?




  Sowe can test our new model in a controlled
  environment
                                                 28
Synthetic Data Set
  16latent factors synthetic e-commerce
  data set
      Dimension: 1,000 (items) by 20,000 (users)
      16 user preference factors
      16 item property factors (non-negative)
      Txn Set: n = 55,360 sparsity = 99.72 %
      Txn+Cart Set: n = 192,985 sparsity = 99.03%
      Download: http://www.IntelligentMining.com/dataset/

       user_id   item_id   type
       10        42        0.25
       10        997       0.25
       10        950       0.25                              29
       11        836       0.25
       11        225       1
Synthetic Data Set
Item property      User preference             Purchase Likelihood score
                                                           1K x 20K matrix
   factors             factors
 1K x 16 matrix      16 x 20K matrix
                                                    X11   X12   X13   X14   X15   X16
                     x
                                                    X21   X22   X12   X24   X25   X26
                     y




                                            items
                                                    X31   X32   X33   X34   X35   X36
 a    b     c        z
                                                    X41   X42   X43   X44   X45   X46

                                                    X51   X52   X53   X54   X55   X56

                                                                 users

X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z

X32 = Likelihood of Item 3 being purchased by User 2
                                                                                        30
Synthetic Data Set
X11                        X31                                   X51
                                 Based on the distribution,
                                 pre-determine # of items
X21                        X41   purchased by an user            X41
                                 (# of item=2)
X31     Sort by Purchase   X21   From the top, select and skip
                                                                 X31
        likelihood Score         certain items to create data
X41                        X51   sparsity.                       X21

X51                        X11                                   X11



      User 1 purchased Item 4 and Item 1




                                                                       31
Experiment Setup
  Each model (Random / kNN / SVD+kNN) will
   generate top 20 recommendations for each item.
  Compare model output to the actual top 20
   provided by synthetic data set
  Evaluation Metrics :
      Precision %: Overlapping of the top 20 between model
       output and actual (higher the better)
                     {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items}
       Precision =
                                 {Found _ Top20 _ items}

      Quality metric: Average of the actual ranking in the
       model output (lower the better)
       €                                                                              32

         1   2   30      47   50   21              1     2   368   62     900   510
Experimental Result
     kNN            vs. Random (Control)

Precision %                           Quality
(higher is better)                    (Lower is better)




                                                          33
Experimental Result
    Precision       % of SVD+kNN
Recall %
(higher is better)




                                    Improvement




                                                     34
                                          SVD Rank
Experimental Result
      Quality      of SVD+kNN
Quality
(Lower is better)




                                 Improvement




                                                    35
                                         SVD Rank
Experimental Result
    The       effect of using Cart data
Precision %
(higher is better)




                                                      36
                                           SVD Rank
Experimental Result
  The       effect of using Cart data
Quality
(Lower is better)




                                                    37
                                         SVD Rank
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            38
References
    J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of
     Predictive Algorithms for Collaborative Filtering," in Proceedings of the
     Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI
     1998), 1998.
    B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative
     filtering recommendation algorithms," in Proceedings of the Tenth
     International Conference on the World Wide Web (WWW 10), pp. 285-295,
     2001.
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of
     Dimensionality Reduction in Recommender System A Case Study" In
     ACM WebKDD 2000 Web Mining for E-Commerce Workshop
    Apache Lucene Mahout http://lucene.apache.org/mahout/
    Cofi: A Java-Based Collaborative Filtering Library
     http://www.nongnu.org/cofi/


                                                                                 39
Thank you
  Any   question or comment?




                                40

More Related Content

What's hot

Recommender Engines
Recommender EnginesRecommender Engines
Recommender EnginesThomas Hess
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemRishabh Mehta
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerceAlexander Konduforov
 

What's hot (20)

Recommender Engines
Recommender EnginesRecommender Engines
Recommender Engines
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 

Viewers also liked

Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCMLconf
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music IndustryData Science London
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRomain Lerallut
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization MachinesPavel Kalaidin
 
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2njit-ronbrown
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Domonkos Tikk
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationrubyyc
 
آموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومآموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومfaradars
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFMLiangjie Hong
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsAladejubelo Oluwashina
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringDKALab
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 

Viewers also liked (20)

Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music Industry
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at Criteo
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization Machines
 
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
آموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومآموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دوم
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative Filtering
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 

Similar to Building a Recommendation Engine - An example of a product recommendation engine

2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrongJulien Coquet
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaTrushita Redij
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Agile Workshop: Agile Metrics
Agile Workshop: Agile MetricsAgile Workshop: Agile Metrics
Agile Workshop: Agile MetricsSiddhi
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkDamian R. Mingle, MBA
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfData Science Council of America
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIMEAli Raza Anjum
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksIBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksSenturus
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsApigee | Google Cloud
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIVijayananda Mohire
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 

Similar to Building a Recommendation Engine - An example of a product recommendation engine (20)

2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Agile Workshop: Agile Metrics
Agile Workshop: Agile MetricsAgile Workshop: Agile Metrics
Agile Workshop: Agile Metrics
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That Work
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdf
 
E miner
E minerE miner
E miner
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksIBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
 
Z suzanne van_den_bosch
Z suzanne van_den_boschZ suzanne van_den_bosch
Z suzanne van_den_bosch
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee Insights
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Lean startup
Lean startupLean startup
Lean startup
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AI
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

More from NYC Predictive Analytics

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsNYC Predictive Analytics
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMNYC Predictive Analytics
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionNYC Predictive Analytics
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsNYC Predictive Analytics
 
An Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionAn Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionNYC Predictive Analytics
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeNYC Predictive Analytics
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 

More from NYC Predictive Analytics (10)

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive Models
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System Competition
 
R package Recommendation Engine
R package Recommendation EngineR package Recommendation Engine
R package Recommendation Engine
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive Analytics
 
An Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionAn Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for Prediction
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive Change
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Recommendation Engine Demystified
Recommendation Engine DemystifiedRecommendation Engine Demystified
Recommendation Engine Demystified
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Building a Recommendation Engine - An example of a product recommendation engine

  • 1. BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com
  • 2. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 2
  • 3. The Business Problem   Design product recommender solution that will increase revenue. $$ 3
  • 4. How Do We Increase Revenue? Increase Conversion Increase Revenue Increase Unit Price Increase Avg. Order Value Increase Units / Order 4
  • 5. Example   Is this recommendation effective? Increase Unit Price Increase Units / Order 5
  • 6. What am I going to do? 6
  • 7. Predictive Model   Framework ML Prediction Data Features Algorithm Output What data? What feature? Which Algorithm ? Cross-sell & Up-sell Recommendation 7
  • 8. What Data to Use?   Explicit data   Ratings   Comments   Implicit data   Order history / Return history   Cart events   Page views   Click-thru   Search log   In today’s talk we only use Order history and Cart events 8
  • 9. Predictive Model ML Prediction Data Features Algorithm Output Order History What feature? Which Algorithm ? Cross-sell & Up-sell Cart Events Recommendation 9
  • 10. What Features to Use?   We know that a given product tends to get purchased by customers with similar tastes or needs.   Use user engagement data to describe a product. users 1 2 3 4 5 6 7 8 9 10 … n item 17 1 .25 .25 1 .25 user engagement vector 10
  • 11. Data Representation / Features   When we merge every item’s user engagement vector, we got a m x n item-user matrix users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 .25 2 .25 items 3 1 .25 1 4 .25 1 .25 1 1 1 … m 11
  • 12. Data Normalization   Ensurethe magnitudes of the entries in the dataset matrix are appropriate users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .5 1 .9 1 .92 1 .49 2 1 .79 items 3 1 .67 1 .46 1 .73 4 1 .39 1 .82 1 .76 1 .69 1 1 … … .52 .8 m   Removecolumn average – so frequent buyers 12 don’t dominate the model
  • 13. Data Normalization   Differentengagement data points (Order / Cart / Page View) should have different weights   Common normalization strategies:   Remove column average   Remove row average   Remove global mean   Z-score   Fill-in the null values 13
  • 14. Predictive Model ML Prediction Data Features Algorithm Output Order History User engagement Which Algorithm ? Cross-sell & Up-sell Cart Events vector Recommendation Data Normalization 14
  • 15. Which Algorithm?   How do we find the items that have similar user engagement data? users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 1 2 1 items 17 1 1 1 .25 .25 18 1 .25 1 1 1 1 … .25 m   We can find the items that have similar user 15 engagement vectors with kNN algorithm
  • 16. k-Nearest Neighbor (kNN)   Find the k items that have the most similar user engagement vectors users 1 2 3 4 5 6 7 8 9 10 … n 1 .5 1 1 1 2 1 .5 1 items 3 1 1 1 1 4 1 .5 1 1 .5 1 … m 1 .5   Nearest Neighbors of Item 4 = [2,3,1] 16
  • 17. Similarity Measure for kNN users 1 2 3 4 5 6 7 8 9 10 … n items 2 1 .5 1 4 1 .5 1 1   Jaccard coefficient: (1+ 1) sim(a,b) = (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)   Cosine similarity: a•b (1*1+ 0.5 *1) sim(a,b) = cos(a,b) = = € a ∗ b (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 ) 2 2   Pearson Correlation: € corr(a,b) = ∑ (r − r )(r − r ) i ai a bi b = m∑ aibi − ∑ ai ∑ bi ∑ (r − r ) ∑ (r − r ) i ai a 2 i bi b 2 m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2 17 match _ cols * Dotprod(a,b) − sum(a) * sum(b) = match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
  • 18. k-Nearest Neighbor (kNN) Item feature space Similarity Measure (cosine similarity) 7 9 8 2 1 4 6 5 3 kNN k=5 Nearest Neighbors(8) = [9,6,3,1,2] 18
  • 19. Predictive Model   Ver. 1: kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbor Cross-sell & Up-sell Cart Events vector (kNN) Recommendation Data Normalization 19
  • 20. Cosine Similarity – Code fragment long i_cnt = 100000; // number of items 100K long u_cnt = 2000000; // number of users 2M double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation) double norm[i_cnt]; // assume data matrix is loaded …… // calculate vector norm for each user engagement vector for (i=0; i<i_cnt; i++) { norm[i] = 0; for (f=0; f<u_cnt; f++) { norm[i] += data[i][f] * data [i][f]; } 1. 100K rows x 100K rows x 2M features --> scalability problem norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing, } MapReduce/Hadoop, Multicore/Threading, Stream Processors // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem are for (j=0; j<i_cnt; j++) { // loop thru 100K dot_product = 0; This leads us to The SVD dimensionality reduction ! for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M dot_product += data[i][f] * data[j][f]; } printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j])); } 20 // find the Top K nearest neighbors here …….
  • 21. Singular Value Decomposition (SVD) A = U × S ×VT A U S VT m x n matrix m x r matrix r x r matrix r x n matrix € items items rank = k k<r users users users Ak = U k × Sk × VkT   Low rank approx. Item profile is U k * Sk items   Low rank approx. User profile is S k *VkT 21 €   Low rank approx. Item-User matrix is € U k * Sk * Sk *VkT €
  • 22. Reduced SVD Ak = U k × Sk × VkT Ak Uk Sk VkT 100K x 2M matrix 100K x 3 matrix 3 x 3 matrix 3 x 2M matrix 7 0 0 0 3 0 items items 0 0 1 users rank = 3 Descending Singular Values users   Low rank approx. Item profile is U k * Sk 22 €
  • 23. SVD Factor Interpretation S 3 x 3 matrix   Singular values plot (rank=512) 7 0 0 0 3 0 0 0 1 Descending Singular Values 23 More Significant Latent Factors Noises + Others Less Significant
  • 24. SVD Dimensionality Reduction U k * Sk <----- latent factors -----> # of users € items 3 rank Need to find the most optimal low rank !! 10 24
  • 25. Missing values   Difference between “0” and “unknown”   Missing values do NOT appear randomly.   Value = (Preference Factors) + (Availability) – (Purchased elsewhere) – (Navigation inefficiency) – etc.   Approx. Value = (Preference Factors) +/- (Noise)   Modeling missing values correctly will help us make good recommendations, especially when working with an extremely sparse data set 25
  • 26. Singular Value Decomposition (SVD)   Use SVD to reduce dimensionality, so neighborhood formation happens in reduced user space   SVD helps model to find the low rank approx. dataset matrix, while retaining the critical latent factors and ignoring noise.   Optimal low rank needs to be tuned   SVD is computationally expensive   SVD Libraries:   Matlab [U, S, V] = svds(A,256);   SVDPACKC http://www.netlib.org/svdpack/   SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/   GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html 26
  • 27. Predictive Model   Ver. 2: SVD+kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbors Cross-sell & Up-sell Cart Events vector (kNN) in reduced Recommendation space Data Normalization SVD 27
  • 28. Synthetic Data Set   Why do we use synthetic data set?   Sowe can test our new model in a controlled environment 28
  • 29. Synthetic Data Set   16latent factors synthetic e-commerce data set   Dimension: 1,000 (items) by 20,000 (users)   16 user preference factors   16 item property factors (non-negative)   Txn Set: n = 55,360 sparsity = 99.72 %   Txn+Cart Set: n = 192,985 sparsity = 99.03%   Download: http://www.IntelligentMining.com/dataset/ user_id item_id type 10 42 0.25 10 997 0.25 10 950 0.25 29 11 836 0.25 11 225 1
  • 30. Synthetic Data Set Item property User preference Purchase Likelihood score 1K x 20K matrix factors factors 1K x 16 matrix 16 x 20K matrix X11 X12 X13 X14 X15 X16 x X21 X22 X12 X24 X25 X26 y items X31 X32 X33 X34 X35 X36 a b c z X41 X42 X43 X44 X45 X46 X51 X52 X53 X54 X55 X56 users X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z X32 = Likelihood of Item 3 being purchased by User 2 30
  • 31. Synthetic Data Set X11 X31 X51 Based on the distribution, pre-determine # of items X21 X41 purchased by an user X41 (# of item=2) X31 Sort by Purchase X21 From the top, select and skip X31 likelihood Score certain items to create data X41 X51 sparsity. X21 X51 X11 X11   User 1 purchased Item 4 and Item 1 31
  • 32. Experiment Setup   Each model (Random / kNN / SVD+kNN) will generate top 20 recommendations for each item.   Compare model output to the actual top 20 provided by synthetic data set   Evaluation Metrics :   Precision %: Overlapping of the top 20 between model output and actual (higher the better) {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items} Precision = {Found _ Top20 _ items}   Quality metric: Average of the actual ranking in the model output (lower the better) € 32 1 2 30 47 50 21 1 2 368 62 900 510
  • 33. Experimental Result   kNN vs. Random (Control) Precision % Quality (higher is better) (Lower is better) 33
  • 34. Experimental Result   Precision % of SVD+kNN Recall % (higher is better) Improvement 34 SVD Rank
  • 35. Experimental Result   Quality of SVD+kNN Quality (Lower is better) Improvement 35 SVD Rank
  • 36. Experimental Result   The effect of using Cart data Precision % (higher is better) 36 SVD Rank
  • 37. Experimental Result   The effect of using Cart data Quality (Lower is better) 37 SVD Rank
  • 38. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 38
  • 39. References   J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in Proceedings of the Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI 1998), 1998.   B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the Tenth International Conference on the World Wide Web (WWW 10), pp. 285-295, 2001.   B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of Dimensionality Reduction in Recommender System A Case Study" In ACM WebKDD 2000 Web Mining for E-Commerce Workshop   Apache Lucene Mahout http://lucene.apache.org/mahout/   Cofi: A Java-Based Collaborative Filtering Library http://www.nongnu.org/cofi/ 39
  • 40. Thank you   Any question or comment? 40