SlideShare a Scribd company logo
1 of 21
Predicting Customer Conversion
with Random Forests
A Decision Trees Case Study




Daniel Gerlanc, Principal
Enplus Advisors, Inc.
www.enplusadvisors.com
dgerlanc@enplusadvisors.com
Topics
Objectives       Research Question

                   Bank Prospect
  Data
                    Conversion
                   Decision Trees
Methods
                  Random Forests

 Results
Objective

• Which customer or prospects should
  you call today?
• To whom should you offer incentives?
Dataset

• Direct Marketing campaign for bank
  loans
• http://archive.ics.uci.edu/ml/datasets/Ba
  nk+Marketing
• 45211 records, 17 attributes
Dataset
Decision Trees
Decision Trees

              Windy    Coat
        yes

Sunny                 No Coat

        no    Coat
Statistical Decision
         Trees

• Randomness
• May not know the relationships ahead
  of time
Decision Trees
Splitting




Deterministic process
Decision Tree Code
 tree.1 <- rpart(takes.loan ~ ., data=bank)




• See the „rpart‟ and „rpart.plot‟ R packages.
• Many parameters available to control the fit.
Make Predictions
predict(tree.1, type=“vector”)
How‟d it do?
 Guessing Precision: 11.7%
 Decision Tree: 34.8%
                           Actual
Predicted   no                yes
no          (1)   38,904      (3)   3,444
yes         (2)   1,018       (4)   1,845
Decision Tree
       Problems

• Overfitting the data
• High variance
• Not globally optimal
Random Forests


One Decision
    Tree



                 Many Decision
                Trees (Ensemble)
Building RF

• Sample from the data
• At each split, sample from the available
  variables
• Repeat for each tree
Why more than 1?

• Create uncorrelated trees
• Reduce variance of predictor
• Continual cross-validation
Random Forests
rffit.1 <- randomForest(takes.loan ~ ., data=bank)




Most important parameters are:
 Variable    Description                             Default

 ntree       Number of Trees                         500
 mtry        Number of variables to randomly         square root of # predictors for
             select at each node                     classification, # predictors / 3 for
                                                     regression
How‟d it do?
Guessing Precision: 11.7%
Random Forest: 64.5%
                           Actual
Predicted   no                yes
no          (1)   38,526      (3)   1396
yes         (2)   2748        (4)   2541
Benefits of RF

• Don‟t need a lot of tuning
• Don‟t need an extra cross validation
  step
• Many implementations
 • R, Weka, RapidMiner, Mahout
References
•   Breiman, Leo. Classification and Regression Trees. Belmont, Calif:
    Wadsworth International Group, 1984. Print.

•   Brieman, Leo and Adele Cutler. Random forests.
    http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.ht
    m

•   S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank
    Direct Marketing: An Application of the CRISP-DM Methodology. In P.
    Novais et al. (Eds.), Proceedings of the European Simulation and
    Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal,
    October, 2011. EUROSIS.

More Related Content

Viewers also liked

Viewers also liked (14)

Air Relay Switch MD series
Air Relay Switch MD seriesAir Relay Switch MD series
Air Relay Switch MD series
 
AMeachem Portfolio
AMeachem PortfolioAMeachem Portfolio
AMeachem Portfolio
 
Adjustable Differential type Vacuum Switch MA series
Adjustable Differential type Vacuum Switch MA seriesAdjustable Differential type Vacuum Switch MA series
Adjustable Differential type Vacuum Switch MA series
 
An introductory guide_to_facebook_for_business
An introductory guide_to_facebook_for_businessAn introductory guide_to_facebook_for_business
An introductory guide_to_facebook_for_business
 
High Range Pressure Switch Mz series
High Range Pressure Switch Mz seriesHigh Range Pressure Switch Mz series
High Range Pressure Switch Mz series
 
Le journal d'Argentré-du-Plessis
Le journal d'Argentré-du-PlessisLe journal d'Argentré-du-Plessis
Le journal d'Argentré-du-Plessis
 
Flameproof Flanged Pressure Switches FC series
Flameproof Flanged Pressure Switches FC seriesFlameproof Flanged Pressure Switches FC series
Flameproof Flanged Pressure Switches FC series
 
News oct 19 25
News oct 19   25News oct 19   25
News oct 19 25
 
Flanged end Pressure Switches MZ Series
Flanged end Pressure Switches MZ SeriesFlanged end Pressure Switches MZ Series
Flanged end Pressure Switches MZ Series
 
Flameproof Low Range Pressure Switches FC series
Flameproof Low Range Pressure Switches FC seriesFlameproof Low Range Pressure Switches FC series
Flameproof Low Range Pressure Switches FC series
 
Why Buy Flowers?
Why Buy Flowers?Why Buy Flowers?
Why Buy Flowers?
 
R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)
 
Insersion type Radar Level Transmitter ELGWR 40
Insersion type Radar Level Transmitter ELGWR 40Insersion type Radar Level Transmitter ELGWR 40
Insersion type Radar Level Transmitter ELGWR 40
 
Freebirds Student Ad Campaign
Freebirds Student Ad CampaignFreebirds Student Ad Campaign
Freebirds Student Ad Campaign
 

Similar to Random Forests Lightning Talk

Big Data Analytics for connected home
Big Data Analytics for connected homeBig Data Analytics for connected home
Big Data Analytics for connected homeHéloïse Nonne
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesAmit Sheth
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agricultureAboul Ella Hassanien
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 

Similar to Random Forests Lightning Talk (20)

Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Big Data Analytics for connected home
Big Data Analytics for connected homeBig Data Analytics for connected home
Big Data Analytics for connected home
 
Machine Learning Workshop
Machine Learning WorkshopMachine Learning Workshop
Machine Learning Workshop
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agriculture
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
Learning from data
Learning from dataLearning from data
Learning from data
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 

Random Forests Lightning Talk

  • 1. Predicting Customer Conversion with Random Forests A Decision Trees Case Study Daniel Gerlanc, Principal Enplus Advisors, Inc. www.enplusadvisors.com dgerlanc@enplusadvisors.com
  • 2. Topics Objectives Research Question Bank Prospect Data Conversion Decision Trees Methods Random Forests Results
  • 3. Objective • Which customer or prospects should you call today? • To whom should you offer incentives?
  • 4. Dataset • Direct Marketing campaign for bank loans • http://archive.ics.uci.edu/ml/datasets/Ba nk+Marketing • 45211 records, 17 attributes
  • 7. Decision Trees Windy Coat yes Sunny No Coat no Coat
  • 8. Statistical Decision Trees • Randomness • May not know the relationships ahead of time
  • 11. Decision Tree Code tree.1 <- rpart(takes.loan ~ ., data=bank) • See the „rpart‟ and „rpart.plot‟ R packages. • Many parameters available to control the fit.
  • 13. How‟d it do? Guessing Precision: 11.7% Decision Tree: 34.8% Actual Predicted no yes no (1) 38,904 (3) 3,444 yes (2) 1,018 (4) 1,845
  • 14. Decision Tree Problems • Overfitting the data • High variance • Not globally optimal
  • 15. Random Forests One Decision Tree Many Decision Trees (Ensemble)
  • 16. Building RF • Sample from the data • At each split, sample from the available variables • Repeat for each tree
  • 17. Why more than 1? • Create uncorrelated trees • Reduce variance of predictor • Continual cross-validation
  • 18. Random Forests rffit.1 <- randomForest(takes.loan ~ ., data=bank) Most important parameters are: Variable Description Default ntree Number of Trees 500 mtry Number of variables to randomly square root of # predictors for select at each node classification, # predictors / 3 for regression
  • 19. How‟d it do? Guessing Precision: 11.7% Random Forest: 64.5% Actual Predicted no yes no (1) 38,526 (3) 1396 yes (2) 2748 (4) 2541
  • 20. Benefits of RF • Don‟t need a lot of tuning • Don‟t need an extra cross validation step • Many implementations • R, Weka, RapidMiner, Mahout
  • 21. References • Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print. • Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.ht m • S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

Editor's Notes

  1. Tools that help you decide how to spend those limited resources.