0

I have a large dataset, where I should make a binary prediction. The fact is that, after analyzing the data, I found that some variables are positively correlated to each other. So, I was wondering whether I have to delete some variables and keep the others(i.e if A and B are correlated, should I delete A and leave B in the data) to continue the process or What is the best way to deal with this kind of problem ?

2 Answers2

1

It depends. If you are using this data on a linear model it is better to remove correlated features. But some non-linear complex model can use or eliminate these correlated feature automatcially.

SrJ
  • 878
  • 4
  • 9
0

Yes you have to remove one of them. For example when you plot a heatmap and notice that 2 features A and B have a correlation value of 0.91, remove one of them as removing both of them will lead to information loss.

After removing one of them, again plot a heatmap of the remaining features and you'll notice the correlation values of other features have changed. So it is an iterative process. Now lets say you have 4 correlated features A, B, C and D. Instead of removing 2 of them, first remove one (either A or B) and then again plot the heatmap. If C and D are still correlated, only then remove either of them.

spectre
  • 2,223
  • 2
  • 14
  • 37