I have found that naive Bayesian classifier performs much better than classification using mixture of multivariate Gaussians.
Here is the problem: I have a set of objects with attached features (10 of them), part of the set is the training set with pre-assigned labels, another part needs to be classified into 3 classes.
I have tested two methods:
(a) approximating distribution of each feature within class as a 1-dimensional Gaussian distribution, calculating probabilities from each feature separately to belong to each class, using Bayes theorem to calculate final (posterior) probability and making a classification on the basis of maximum probability;
(b) approximating distribution of feature vectors within class by 10-dimensional Gaussian distribution and making classification again on the basis of maximum probability.
I have found that method (a) performs much better. What may be the reason for this? I expected that they would perform at least equally.
Does it have anything to do with the topology of the distribution of the features in the training set?