Time Series Foundation Models - current state and future directions
Gbm.more GBM in H2O
1. H2O – The Open Source Math Engine
H2O and
Gradient
Boosting
2. What is Gradient Boosting
gbm is a boosted ensemble of decision trees, fitted in a
stagewise forward fashion to minimize a loss function
ie gbm is a sum of decision trees
each new tree corrects errors of the previous forest
3. Why gradient boosting
Performs variable selecting during fitting process
• Highly collinear explanatory variables
- glm: backwards/forwards is unstable
Interactions: will search to a specified depth
Captures nonlinearities in the data
• ex airlines on-time performance: gbm captures a change in 2001
without analyst having to do so
4. Why gradient boosting, more
Will naturally handle unscaled data (unlike glm, particularly
with L1, L2 penalties)
Handles ordinal data, eg
income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)]
Relatively insensitive to long tailed distributions and outliers
5. gradient boosting works well
on the right dataset, gbm classification will outperform both
glm and random forest
Demonstrates good performance on various classification
problems
• Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge:
gbm main model to predict telco customer churn
• KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4
winners incorporated gbm
• many kaggle winners
• results at previous employers
6. Inference algorithm (simplified)
1. Initialize k predictors f_k,m=0(x)
2. for m = 1:num_trees
a. normalize current predictions
b. for k = 1:num_classes
i. compute pseudo residual r = y – p_k
ii. fit a regression tree to targets r with data X
iii. for each terminal region, compute multiplier that maximizes the deviance
loss
iv. f_k,m+1(x) = f_k,m(x) + region multiplier
9. but has pain points
Slow to fit
Slow to predict
Data size limitations: often downsampling required
Many implementations single threaded
Parameters difficult to understand
Fit with searching, choose with holdout:
• Interaction levels / depths [1,5,10,15]
• trees: [10,100,1000,5000]
• learning rate: [.1, .01, .001]
• this is often an overnight job