Questions tagged [gradient-boosting-decision-trees]

54 questions
6
votes
1 answer

What is Pruning & Truncation in Decision Trees?

Pruning & Truncation As per my understanding Truncation: Stop the tree while it is still growing so that it may not end up with leaves containing very low data points. One way to do this is to set a minimum number of training inputs to use on each…
Pluviophile
  • 4,203
  • 14
  • 32
  • 56
5
votes
1 answer

Multi-target regression tree with additional constraint

I have a regression problem where I need to predict three dependent variables ($y$) based on a set of independent variables ($x$): $$ (y_1,y_2,y_3) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n +u. $$ To solve this problem, I would…
4
votes
1 answer

XGBoost - Imputing Vs keeping NaN

What is the benefit of imputing numerical or categorical features when using DT methods such as XGBoost that can handle missing values? This question is mainly for when the values are missing not at random. An example of missing not at random…
4
votes
3 answers

Am I building a good or bad model for prediction built using Gradient Boosting Classifier Algorithm?

I am building a binary classification model using GB Classifier for imbalanced data with event rate 0.11% having sample size of 350000 records (split into 70% training & 30% testing). I have successfully tuned hyperparameters using GridsearchCV, and…
3
votes
3 answers

Example for Boosting

Can someone exactly tell me how does boosting as implemented by LightGBM or XGBoost work in real case scenerio. Like I know it splits tree leaf wise instead of level wise, which will contribute to global average not just the loss of branch which…
2
votes
1 answer

What if root of a such tree is pruned in xgboost?

Extreme Gradient Boosting stops to grow a tree if $\gamma$ is greater than impurity reduction given as eq (7) (see below) , what does happen if tree's root has a negative impurity? I think there is no any way to boosting goes on because the next…
2
votes
1 answer

My tree based models keep overfitting

This is a project of multi classification. Each model severely overfits. Decision Tree, Random Forrest and especially XGBoost. And the classification report reflects that. where the csv…
2
votes
1 answer

How to determine the feasible domain of a trained tree model?

As far as I know, tree models (such as those trained using xgboost/lightgbm) makes reasonable prediction only if the input feature vector is similar to the train set data. If the feature vector looks like an outlier, then the prediction result is…
2
votes
0 answers

Handling Missing Values in Predictor Variables for Gradient Boosting Models ( gbm() ) in R

I am currently working on a predictive modeling project using the gbm package in R and have encountered a challenge regarding missing values in one of my predictor variables. I would appreciate your insights and recommendations on the best practices…
Anso
  • 21
  • 2
2
votes
2 answers

Why does the regression model produced by XGBoost depend on the order of the training data when more than 8194 data points are used?

When I use XGBRegressor to construct a boosted tree model from 8194 or fewer data points (i.e., n_train $\leq$ 8194, where n_train is defined in the code below) and randomly shuffle the data points before training, the fit method is order…
2
votes
0 answers

Transfer learning for tabular data

I wonder if transfer learning can be used in tabular data similarly to how it's used in neural networks for image recognition. My idea would be to train a "general" model and then "localize" it using a specific dataset. I have a problem akin to this…
2
votes
0 answers

Tuning the learning rate parameter for GBDT models

I've always been taught that decreasing the learning rate parameter in gbdt models such as XGBoost, LightGBM and Catboost will improve the out-of-sample performance, assuming the number of iterations is increased accordingly and all else…
2
votes
1 answer

Model performance impact on social discrimination?

I am currently working on a project where the data concerns people and the dataset contain personal data with sensitive attributes. (typically: age, sex, handicap, race). Now it seems there are mainly three options for modelling: Not take the…
Lucas Morin
  • 2,775
  • 5
  • 25
  • 47
2
votes
1 answer

Random LightGBM Forest

I'm not completly sure about the bias/variance of boosted decision trees (LightGBM especially), thus I wonder if we generally would expect a performance boost by creating an ensemble of multiple LightGBM models, just like with Random Forest?
1
vote
0 answers

Why is the average prediction moving away from average response for a reg:gamma model

I'm predicting a response that I would typically model under a gamma distribution, with relatively simple paramters, I'm just using the default other than these: learning_rate = 0.01 max_depth = 6 base_score = the average of y Since my base_score…
1
2 3 4