Gbm.more GBM in H2O

•Télécharger en tant que PPTX, PDF•

1 j'aime•1,743 vues

Sri Ambati

Technologie Formation

H2O – The Open Source Math Engine
H2O and
Gradient
Boosting

What is Gradient Boosting
gbm is a boosted ensemble of decision trees, fitted in a
stagewise forward fashion to minimize a loss function
ie gbm is a sum of decision trees
each new tree corrects errors of the previous forest

Why gradient boosting
Performs variable selecting during fitting process
• Highly collinear explanatory variables
- glm: backwards/forwards is unstable
Interactions: will search to a specified depth
Captures nonlinearities in the data
• ex airlines on-time performance: gbm captures a change in 2001
without analyst having to do so

Why gradient boosting, more
Will naturally handle unscaled data (unlike glm, particularly
with L1, L2 penalties)
Handles ordinal data, eg
income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)]
Relatively insensitive to long tailed distributions and outliers

gradient boosting works well
on the right dataset, gbm classification will outperform both
glm and random forest
Demonstrates good performance on various classification
problems
• Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge:
gbm main model to predict telco customer churn
• KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4
winners incorporated gbm
• many kaggle winners
• results at previous employers

Inference algorithm (simplified)
1. Initialize k predictors f_k,m=0(x)
2. for m = 1:num_trees
a. normalize current predictions
b. for k = 1:num_classes
i. compute pseudo residual r = y – p_k
ii. fit a regression tree to targets r with data X
iii. for each terminal region, compute multiplier that maximizes the deviance
loss
iv. f_k,m+1(x) = f_k,m(x) + region multiplier

Regression tree, 1
R1
R2
R4
R3
X1
X2
2
7
1

Regression tree, 2
1-level regression tree: 2 terminal nodes, split decision:
minimize squared error
Data (9 observations)
Errors
X 1 1 1 2 2 2 3 4 4
R 0.333 0.333 0.333 -0.333 -0.333 -0.333 0.667 0.333 -0.333
split left_sum right_sum left_mle right_mle left_err right_err total_err
1 to 2 2.00 -0.33 0.67 -0.06 0.00 0.98 0.98
2 to 3 1.00 0.67 0.17 0.22 1.50 0.52 2.02
3 to 4 1.67 0.00 0.24 0.00 1.71 0.22 1.94

but has pain points
Slow to fit
Slow to predict
Data size limitations: often downsampling required
Many implementations single threaded
Parameters difficult to understand
Fit with searching, choose with holdout:
• Interaction levels / depths [1,5,10,15]
• trees: [10,100,1000,5000]
• learning rate: [.1, .01, .001]
• this is often an overnight job

h2o can help
multicore
distributed
parallel

gbm intuition
Why should this work well?

Universe is sparse. Life is messy.
Data is sparse & messy. - Lao Tzu

Contenu connexe

Tendances

Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar

Matrix decomposition and_applications_to_nlpankit_ppt

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

Boosted treeZhuyi Xue

Introduction of Feature HashingWush Wu

Streaming Python on HadoopVivian S. Zhang

Ml9 introduction to-unsupervised_learning_and_clustering_methodsankit_ppt

H2O World - GBM and Random Forest in H2O- Mark LandrySri Ambati

Kaggle presentationHJ van Veen

Predict future time series forecastingHichem Felouat

07 learningankit_ppt

Support Vector Machine and Implementation using WekaMacha Pujitha

Ml10 dimensionality reduction-and_advanced_topicsankit_ppt

Support Vector Machine (Classification) - Step by StepManish nath choudhary

Build your own Convolutional Neural Network CNNHichem Felouat

Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu

TensorFlow in 3 sentencesBarbara Fusinska

Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest

Distributed Deep Q-LearningLyft

Safe and Efficient Off-Policy Reinforcement Learningmooopan

Tendances (20)

Data Wrangling For Kaggle Data Science Competitions

Matrix decomposition and_applications_to_nlp

Kaggle Higgs Boson Machine Learning Challenge

Boosted tree

Introduction of Feature Hashing

Streaming Python on Hadoop

Ml9 introduction to-unsupervised_learning_and_clustering_methods

H2O World - GBM and Random Forest in H2O- Mark Landry

Kaggle presentation

Predict future time series forecasting

07 learning

Support Vector Machine and Implementation using Weka

Ml10 dimensionality reduction-and_advanced_topics

Support Vector Machine (Classification) - Step by Step

Build your own Convolutional Neural Network CNN

Adversarial Reinforced Learning for Unsupervised Domain Adaptation

TensorFlow in 3 sentences

Lecture 18: Gaussian Mixture Models and Expectation Maximization

Distributed Deep Q-Learning

Safe and Efficient Off-Policy Reinforcement Learning

En vedette

XGBoost (System Overview)Natallie Baikevich

Inlining HeuristicsNatallie Baikevich

Automated data analysis with PythonGramener

Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak

Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang

Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George

En vedette (6)

XGBoost (System Overview)

Inlining Heuristics

Automated data analysis with Python

Gradient boosting in practice: a deep dive into xgboost

Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author

Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines

Similaire à Gbm.more GBM in H2O

MLHEP 2015: Introductory Lecture #4arogozhnikov

Lift-and-Project Cuts in CPLEX 12.5.1IBM Decision Optimization

Dynamic pgmmingDr. C.V. Suresh Babu

dynamic programming Rod cutting classgiridaroori

Lect 03 - first portionMoe Moe Myint

beyond linear programming: mathematical programming extensionsAngelica Angelo Ocon

MLHEP Lectures - day 3, basic trackarogozhnikov

Algorithm reviewchidabdu

Why Image compression is Necessary?Prabhat Kumar

LP.pptxJaganNathan555342

Dynamic Programming and Reinforcement Learning applied to Tetris GameSuelen Carvalho

LPP.pptxRajeevRanjan743854

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya

FPGA based BCH Decoderijsrd.com

04 AlgorithmsOmid Djoudi

Study on Application of Ensemble learning on Credit Scoringharmonylab

Explore ml day 2preetikumara

Updating PageRank for Streaming GraphsJason Riedy

Quantitativetechniqueformanagerialdecisionlinearprogramming 090725035417-phpa...kongara

Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...PyData

Similaire à Gbm.more GBM in H2O (20)

MLHEP 2015: Introductory Lecture #4

Lift-and-Project Cuts in CPLEX 12.5.1

Dynamic pgmming

dynamic programming Rod cutting class

Lect 03 - first portion

beyond linear programming: mathematical programming extensions

MLHEP Lectures - day 3, basic track

Algorithm review

Why Image compression is Necessary?

LP.pptx

Dynamic Programming and Reinforcement Learning applied to Tetris Game

LPP.pptx

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...

FPGA based BCH Decoder

04 Algorithms

Study on Application of Ensemble learning on Credit Scoring

Explore ml day 2

Updating PageRank for Streaming Graphs

Quantitativetechniqueformanagerialdecisionlinearprogramming 090725035417-phpa...

Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...

Plus de Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Generative AI Masterclass - Model Risk Management.pptxSri Ambati

AI and the Future of Software Development: A Sneak Peek Sri Ambati

LLMOps: Match report from the top of the 5thSri Ambati

Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati

Risk Management for LLMsSri Ambati

Open-Source AI: Community is the WaySri Ambati

Building Custom GenAI Apps at H2OSri Ambati

Applied Gen AI for the Finance Vertical Sri Ambati

Cutting Edge Tricks from LLM PapersSri Ambati

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati

LLM Interpretability Sri Ambati

Never Reply to an Email AgainSri Ambati

Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati

AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati

Plus de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Generative AI Masterclass - Model Risk Management.pptx

AI and the Future of Software Development: A Sneak Peek

LLMOps: Match report from the top of the 5th

Building, Evaluating, and Optimizing your RAG App for Production

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...

Risk Management for LLMs

Open-Source AI: Community is the Way

Building Custom GenAI Apps at H2O

Applied Gen AI for the Finance Vertical

Cutting Edge Tricks from LLM Papers

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...

LLM Interpretability

Never Reply to an Email Again

Introducción al Aprendizaje Automatico con H2O-3 (1)

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...

AI Foundations Course Module 1 - An AI Transformation Journey

Dernier

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

How to write a Business Continuity PlanDatabarracks

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Sample pptx for embedding into website for demoHarshalMandlekar2

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

A Journey Into the Emotions of Software DevelopersNicole Novielli

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

From Family Reminiscence to Scholarly Archive .Alan Dix

WordPress Websites for Engineers: Elevate Your Brandgvaughan

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Dernier (20)

Unraveling Multimodality with Large Language Models.pdf

Nell’iperspazio con Rocket: il Framework Web di Rust!

How to write a Business Continuity Plan

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

SIP trunking in Janus @ Kamailio World 2024

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Sample pptx for embedding into website for demo

Generative AI for Technical Writer or Information Developers

A Journey Into the Emotions of Software Developers

DevoxxFR 2024 Reproducible Builds with Apache Maven

Moving Beyond Passwords: FIDO Paris Seminar.pdf

The Ultimate Guide to Choosing WordPress Pros and Cons

From Family Reminiscence to Scholarly Archive .

WordPress Websites for Engineers: Elevate Your Brand

What is DBT - The Ultimate Data Build Tool.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Time Series Foundation Models - current state and future directions

Gbm.more GBM in H2O

1. H2O – The Open Source Math Engine H2O and Gradient Boosting

2. What is Gradient Boosting gbm is a boosted ensemble of decision trees, fitted in a stagewise forward fashion to minimize a loss function ie gbm is a sum of decision trees each new tree corrects errors of the previous forest

3. Why gradient boosting Performs variable selecting during fitting process • Highly collinear explanatory variables - glm: backwards/forwards is unstable Interactions: will search to a specified depth Captures nonlinearities in the data • ex airlines on-time performance: gbm captures a change in 2001 without analyst having to do so

4. Why gradient boosting, more Will naturally handle unscaled data (unlike glm, particularly with L1, L2 penalties) Handles ordinal data, eg income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)] Relatively insensitive to long tailed distributions and outliers

5. gradient boosting works well on the right dataset, gbm classification will outperform both glm and random forest Demonstrates good performance on various classification problems • Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge: gbm main model to predict telco customer churn • KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4 winners incorporated gbm • many kaggle winners • results at previous employers

6. Inference algorithm (simplified) 1. Initialize k predictors f_k,m=0(x) 2. for m = 1:num_trees a. normalize current predictions b. for k = 1:num_classes i. compute pseudo residual r = y – p_k ii. fit a regression tree to targets r with data X iii. for each terminal region, compute multiplier that maximizes the deviance loss iv. f_k,m+1(x) = f_k,m(x) + region multiplier

7. Regression tree, 1 R1 R2 R4 R3 X1 X2 2 7 1

8. Regression tree, 2 1-level regression tree: 2 terminal nodes, split decision: minimize squared error Data (9 observations) Errors X 1 1 1 2 2 2 3 4 4 R 0.333 0.333 0.333 -0.333 -0.333 -0.333 0.667 0.333 -0.333 split left_sum right_sum left_mle right_mle left_err right_err total_err 1 to 2 2.00 -0.33 0.67 -0.06 0.00 0.98 0.98 2 to 3 1.00 0.67 0.17 0.22 1.50 0.52 2.02 3 to 4 1.67 0.00 0.24 0.00 1.71 0.22 1.94

9. but has pain points Slow to fit Slow to predict Data size limitations: often downsampling required Many implementations single threaded Parameters difficult to understand Fit with searching, choose with holdout: • Interaction levels / depths [1,5,10,15] • trees: [10,100,1000,5000] • learning rate: [.1, .01, .001] • this is often an overnight job

10. h2o can help multicore distributed parallel

11. Questions?

12. gbm intuition Why should this work well?

13. Universe is sparse. Life is messy. Data is sparse & messy. - Lao Tzu

Gbm.more GBM in H2O

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Gbm.more GBM in H2O

Similaire à Gbm.more GBM in H2O (20)

Plus de Sri Ambati

Plus de Sri Ambati (20)

Dernier

Dernier (20)

Gbm.more GBM in H2O