Highest Voted 'xgboost' Questions - Data Science Stack Exchange

83

votes

5 answers

GBM vs XGBOOST? Key differences?

I am trying to understand the key differences between GBM and XGBOOST. I tried to google it, but could not find any good answers explaining the differences between the two algorithms and why xgboost almost always performs better than GBM. What makes…

asked Feb 11 '17 at 20:03

Aman

997
1
8
8

61

votes

6 answers

Does XGBoost handle multicollinearity by itself?

I'm currently using XGBoost on a data-set with 21 features (selected from list of some 150 features), then one-hot coded them to obtain ~98 features. A few of these 98 features are somewhat redundant, for example: a variable (feature) $A$ also…

feature-selection correlation xgboost gbm

asked Jul 02 '16 at 07:30

neural-nut

1,803
3
18
28

58

votes

2 answers

How to interpret the output of XGBoost importance?

I ran a xgboost model. I don't exactly know how to interpret the output of xgb.importance. What is the meaning of Gain, Cover, and Frequency and how do we interpret them? Also, what does Split, RealCover, and RealCover% mean? I have some extra…

machine-learning xgboost

asked Jun 21 '16 at 06:02

user14204

44

votes

2 answers

LightGBM vs XGBoost

I'm trying to understand which is better (more accurate, especially in classification problems) I've been searching articles comparing LightGBM and XGBoost but found only…

xgboost

asked May 11 '17 at 12:12

Sergey Nizhevyasov

553
1
4
4

42

votes

6 answers

Unbalanced multiclass data with XGBoost

I have 3 classes with this distribution: Class 0: 0.1169 Class 1: 0.7668 Class 2: 0.1163 And I am using xgboost for classification. I know that there is a parameter called scale_pos_weight. But how is it handled for 'multiclass' case, and how can…

classification xgboost multiclass-classification class-imbalance

asked Jan 16 '17 at 12:53

shda

585
1
5
10

41

votes

4 answers

Why do we need XGBoost and Random Forest?

I wasn't clear on couple of concepts: XGBoost converts weak learners to strong learners. What's the advantage of doing this ? Combining many weak learners instead of just using a single tree ? Random Forest uses various sample from tree to create…

machine-learning data-mining random-forest decision-trees xgboost

asked Oct 14 '17 at 12:33

John Constantine

707
2
8
10

38

votes

3 answers

Is it necessary to normalize data for XGBoost?

MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…

decision-trees xgboost normalization

asked Sep 28 '19 at 13:35

user781486

1,455
2
17
20

36

votes

3 answers

xgboost: give more importance to recent samples

Is there a way to add more importance to points which are more recent when analyzing data with xgboost?

xgboost weighted-data

asked Dec 22 '15 at 17:19

kilojoules

463
1
4
6

36

votes

1 answer

Why is xgboost so much faster than sklearn GradientBoostingClassifier?

I'm trying to train a gradient boosting model over 50k examples with 100 numeric features. XGBClassifier handles 500 trees within 43 seconds on my machine, while GradientBoostingClassifier handles only 10 trees(!) in 1 minutes and 2 seconds :( I…

scikit-learn xgboost gbm

asked Mar 29 '16 at 14:14

ihadanny

1,357
2
11
20

33

votes

3 answers

Hypertuning XGBoost parameters

XGBoost have been doing a great job, when it comes to dealing with both categorical and continuous dependant variables. But, how do I select the optimized parameters for an XGBoost problem? This is how I applied the parameters for a recent Kaggle…

r python xgboost

asked Dec 13 '15 at 14:19

Dawny33

8,476
12
49
106

26

votes

4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…

machine-learning r predictive-modeling decision-trees xgboost

asked Sep 08 '15 at 03:14

GeorgeOfTheRF

2,078
5
18
20

26

votes

2 answers

How fit pairwise ranking models in XGBoost?

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and…

search ranking xgboost gbm

asked Feb 10 '16 at 16:40

tokestermw

418
1
4
8

25

votes

4 answers

Is feature engineering still useful when using XGBoost?

I was reading the material related to XGBoost. It seems that this method does not require any variable scaling since it is based on trees and this one can capture complex non-linearity pattern, interactions. And it can handle both numerical and…

xgboost feature-engineering

asked Mar 20 '17 at 13:58

KevinKim

635
1
7
13

25

votes

3 answers

Pandas Dataframe to DMatrix

I am trying to run xgboost in scikit learn. And I am only using Pandas to load the data into a dataframe. How am I supposed to use pandas df with xgboost? I am confused by the DMatrix routine required to run the xgboost algorithm.

scikit-learn pandas xgboost

asked Jul 15 '16 at 13:48

Ghostintheshell

451
1
5
7

24

votes

1 answer

Lightgbm vs xgboost vs catboost

I've seen that in Kaggle competitions people are using lightgbms where they used to use xgboost. My question is: when would you rather use xgboost instead of lightgbm? What about catboost?

machine-learning xgboost kaggle lightgbm catboost

asked Apr 19 '19 at 06:08

David Masip

6,136
2
28
62

Questions tagged [xgboost]