2

I am trying to do clustering with a bunch (24) of categorical features. I have done some research and found a lot of people recommending something such as K-Modes. I tried running K-Modes on my data and the best run had a cost of 27069.0, which seems pretty high.

Some of my features have only a few values, such as P, O, C, T, so I thought I could encode them. But others have many different values. Any tips on a clustering algorithm or some other approach? I would like to use Python.

EDIT: What about using Gower distance on the data and then using K-Means on that?

formicaman
  • 141
  • 2

1 Answers1

1

You can one-hot encode all your features, first. Then, you will face with a sparse feature space. To resolve this issue, you can use an auto-encoder to encode all these values to a low-dimensional and more dense space. Then run one of your clustering methods such as k-means.

OmG
  • 1,249
  • 9
  • 19