I ran a xgboost model. I don't exactly know how to interpret the output of xgb.importance.
What is the meaning of Gain, Cover, and Frequency and how do we interpret them?
Also, what does Split, RealCover, and RealCover% mean? I have some extra parameters here
Are there any other parameters that can tell me more about feature importances?
From the R documentation, I have some understanding that Gain is something similar to Information gain and Frequency is number of times a feature is used across all the trees. I have no idea what Cover is.
I ran the example code given in the link (and also tried doing the same on the problem that I am working on), but the split definition given there did not match with the numbers that I calculated.
importance_matrix
Output:
Feature Gain Cover Frequence
1: xxx 2.276101e-01 0.0618490331 1.913283e-02
2: xxxx 2.047495e-01 0.1337406946 1.373710e-01
3: xxxx 1.239551e-01 0.1032614896 1.319798e-01
4: xxxx 6.269780e-02 0.0431682707 1.098646e-01
5: xxxxx 6.004842e-02 0.0305611830 1.709108e-02
214: xxxxxxxxxx 4.599139e-06 0.0001551098 1.147052e-05
215: xxxxxxxxxx 4.500927e-06 0.0001665320 1.147052e-05
216: xxxxxxxxxxxx 3.899363e-06 0.0001536857 1.147052e-05
217: xxxxxxxxxxxxxx 3.619348e-06 0.0001808504 1.147052e-05
218: xxxxxxxxxxxxx 3.429679e-06 0.0001792233 1.147052e-05

