Soumettre la recherche
Mettre en ligne
6 Tips for Optimizing TreeNet Gradient Boosting Models
•
Télécharger en tant que PPTX, PDF
•
1 j'aime
•
3,104 vues
Salford Systems
Suivre
Signaler
Partager
Signaler
Partager
1 sur 10
Télécharger maintenant
Recommandé
Inlining Heuristics
Inlining Heuristics
Natallie Baikevich
XGBoost (System Overview)
XGBoost (System Overview)
Natallie Baikevich
Gbm.more GBM in H2O
Gbm.more GBM in H2O
Sri Ambati
GBM package in r
GBM package in r
mark_landry
Automated data analysis with Python
Automated data analysis with Python
Gramener
GBM theory code and parameters
GBM theory code and parameters
Venkata Reddy Konasani
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
Jaroslaw Szymczak
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Vivian S. Zhang
Recommandé
Inlining Heuristics
Inlining Heuristics
Natallie Baikevich
XGBoost (System Overview)
XGBoost (System Overview)
Natallie Baikevich
Gbm.more GBM in H2O
Gbm.more GBM in H2O
Sri Ambati
GBM package in r
GBM package in r
mark_landry
Automated data analysis with Python
Automated data analysis with Python
Gramener
GBM theory code and parameters
GBM theory code and parameters
Venkata Reddy Konasani
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
Jaroslaw Szymczak
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Vivian S. Zhang
Datascience101presentation4
Datascience101presentation4
Salford Systems
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
Salford Systems
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Salford Systems
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
Salford Systems
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
Statistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
Salford Systems
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Salford Systems
Evolution of regression ols to gps to mars
Evolution of regression ols to gps to mars
Salford Systems
Data Mining for Higher Education
Data Mining for Higher Education
Salford Systems
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
Molecular data mining tool advances in hiv
Molecular data mining tool advances in hiv
Salford Systems
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
SPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
Salford Systems
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
Salford Systems
Hybrid cart logit model 1998
Hybrid cart logit model 1998
Salford Systems
Session Logs Tutorial for SPM
Session Logs Tutorial for SPM
Salford Systems
Some of the new features in SPM 7
Some of the new features in SPM 7
Salford Systems
Contenu connexe
Plus de Salford Systems
Datascience101presentation4
Datascience101presentation4
Salford Systems
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
Salford Systems
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Salford Systems
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
Salford Systems
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
Statistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
Salford Systems
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Salford Systems
Evolution of regression ols to gps to mars
Evolution of regression ols to gps to mars
Salford Systems
Data Mining for Higher Education
Data Mining for Higher Education
Salford Systems
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
Molecular data mining tool advances in hiv
Molecular data mining tool advances in hiv
Salford Systems
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
SPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
Salford Systems
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
Salford Systems
Hybrid cart logit model 1998
Hybrid cart logit model 1998
Salford Systems
Session Logs Tutorial for SPM
Session Logs Tutorial for SPM
Salford Systems
Some of the new features in SPM 7
Some of the new features in SPM 7
Salford Systems
Plus de Salford Systems
(20)
Datascience101presentation4
Datascience101presentation4
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Statistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Evolution of regression ols to gps to mars
Evolution of regression ols to gps to mars
Data Mining for Higher Education
Data Mining for Higher Education
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Molecular data mining tool advances in hiv
Molecular data mining tool advances in hiv
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
SPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
Hybrid cart logit model 1998
Hybrid cart logit model 1998
Session Logs Tutorial for SPM
Session Logs Tutorial for SPM
Some of the new features in SPM 7
Some of the new features in SPM 7
6 Tips for Optimizing TreeNet Gradient Boosting Models
1.
Dan Steinberg January 2013 Salford
Systems www.salford-systems.com
2.
While TreeNet (Stochastic Gradient Boosting) can work phenomenally well out of the box it almost always pays to try to tune your control parameters. Devoting time to optimizing a TreeNet model can improve its out of sample performance noticeably. Here is a list of several things recommended for all TreeNet users. © Copyright Salford Systems 2013
3.
TreeNet starts with 200 trees by default, although you can reset default. In real-world modeling we often find that 1,000 or more trees perform better. © Copyright Salford Systems 2013
4.
This one goes hand in hand with growing enough trees because the slower your learn rate is, the more trees you will need. There is nothing wrong with using a learn rate of .001 if you are willing to let your machine run through all the trees you will need. © Copyright Salford Systems 2013
5.
The default value of 0.10 means that 10% of the data could be ignored in each training cycle. You ought to experiment with a value of 0.0 to see if it helps or hurts. You can also try values such as 0.02, 0.05 etc. Note: If the data are very clean 0.0 should work best. © Copyright Salford Systems 2013
6.
If 500 trees are needed when you generate 6 node trees, you might need 1500 or more when generating just 2-node trees. Sometimes moderately large trees work best: 12-node, 15-node, even 25-node trees could do the trick. Since large trees learn more than smaller trees, you might also need to dial down the learn rate to prevent over-fitting. © Copyright Salford Systems 2013
7.
Try Battery LOVO (leave one variable out) as this might allow you to remove a variable from the middle of the pack in terms of importance. Try Battery SHAVING to remove the least important variables (shaving from the bottom of the list). This tests the viability of dropping the "best" variables © Copyright Salford Systems 2013
8.
First, run some completely additive models. Unlike 2-node trees that can actually allow interactions due to the manner in which TreeNet handles missing values. With the ICL ADDITIVE command you guarantee no possible interactions of any kind, including interactions between missing value indicators created by TreeNet and other variables. © Copyright Salford Systems 2013
9.
Then, in the PRO EX version, you can run the BATTERY ADDITIVE procedure which will start with a fully flexible model and search for the one variable which can most readily be made additive (interact with nothing). Then it searches for a second variable to be made additive, and so on, going step by step until all variables are additive. Reviewing the performance curve of this procedure allows the discovery of the optimal balance between full free interactivity and limited interactivity. If a variable or variables really do not interact with any others then preventing chance interactions from creeping into the model will improve the model on future unseen data. © Copyright Salford Systems 2013
10.
For more on TreeNet, visit http://www.salford-systems.com/en/products/treenet © Copyright Salford Systems 2013
Télécharger maintenant