8

I have a Dataset of around 200 features. Most of them are categorical and only a few are numerical. It seems that an artificial neural network with an Autoencoder has some problems with that kind and amount of features. Therefore, I thought to use PCA to reduce the dimensions and then apply the Autoencoder afterwards.

Does the combination of a PCA before neural nets makes sense as the neural nets also reduces the information in internal layers? Does anybody has experiences with such a combination?


Edit: There is also an interesting answer in the stats forum.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
Rene B.
  • 369
  • 2
  • 7
  • 13

2 Answers2

5

Neural networks are actually extremely effective at performing dimensionality reduction. A great example is word2vec, which applies a shallow neural network to reduce inputs on the order of several million features (i.e. unnormalized text) to a 30-150 dimensional vector via a process that is mathematically analogous to matrix factorization (which is the class of techniques PCA belongs to). Autoencoders function very similarly to word2vec: if you're planning on using an autoencoder to learn an embedding of your data, I wouldn't expect you'd gain anything from applying PCA first rather than using an autoencoder (or something similar) to learn a better embedding for your data that isn't constrained by the assumptions of PCA.

David Marx
  • 3,288
  • 16
  • 23
5

PCA is used to abandon having redundant features. It expands directions which your data is highly distributed in those directions. During this process, it does not care about the labels of your data. In those directions your data will be highly distributed and none of features have correlation. PCA just does the preceding things. In that feature space your data may be easily separable or not. But using this before neural nets is a way to reduce the redundant features which may cause your net have too many parameters. It is a kind of pre-processing for just reducing the correlated features, although there are different reasons that one can apply PCA, like data visualization or understanding data, or even reporting data based on e.g. three main components. I recommend you using that because you may find that in the new space you would classify your data with a more smaller net.

Does the combination of a PCA before neural nets makes sense

Yes, you can do that as a pre-processing stage.

... as the neural nets also reduces the information in internal layers?

neural nets do not necessarily reduce the information in internal layers. In convolutional nets, max-pooling layer somehow reduces the unnecessary information but other usual nets such as convolutional layers or dense layers, try to change the space of the inputs of the layers or equivalently with another interpretation they try to find other features and patterns that the data can be separated in those spaces. PCA actually reduces the correlated features.

Does anybody has experiences with such a combination?

Yes. It all depends on your data. In some applications it worked for me, which I had so many correlated features. But it has happened that it does not work very well in both cases of existing or not existing of correlated features.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98