Normalize / Standardize in a Random Forest?

Question

If I have a matrix of co-occurring words in conversations of different lengths, is it appropriate to standardize / normalize the data prior to training?

My matrix is set up as follows: one row per two-person conversation, and columns are the words that co-occur between speakers. I cannot help but think that, as a longer conversation will likely comprise more shared words than shorter ones, I should factor this in somehow.

score 4 · Accepted Answer · answered Oct 21 '19 at 13:20

Thanks for the clarification by commenting. Tree-based models do not care about the absolute value that a feature takes. They only care about the order of the values. Hence, normalization is used mainly in linear models/knn/neural networks because they're affected by absolute values taken by features.

You don't need to normalize/standardize.

Check this post.

Normalize / Standardize in a Random Forest?

1 Answers1