Non-Parametric Methods Like K-Nearest-Neighbours in High Dimensional Feature Space

Question

The main idea of k-Nearest-Neighbour takes into account the $k$ nearest points and decides the classification of the data by majority vote. If so, then it should not have problems in higher dimensional data because methods like locality sensitive hashing can efficiently find nearest neighbours.

In addition, feature selection with Bayesian networks can reduce the dimension of data and make learning easier.

However, this review paper by John Lafferty in statistical learning points out that non-parametric learning in high dimensional feature spaces is still a challenge and unsolved.

What is going wrong?

score 5 · Accepted Answer · answered Apr 04 '12 at 20:29

This problem is known as the curse of dimensionality. Basically, as you increase the number of dimensions, $d$, points in the space generally tend to become far from all other points. This makes partitioning the space (such as is necessary for classification or clustering) very difficult.

You can see this for yourself very easily. I generated $50$ random $d$-dimensional points in the unit hypercube at 20 evenly selected values of $d$ from $1..1000$. For each value of $d$ I computed the distance from the first point to all others and took the average of these distances. Plotting this, we can see that average distance is increasing with dimensionality even though the space in which we are generating the points in each dimension remains the same.

Average distance vs. dimensionality

score 3 · Answer 2 · answered Apr 04 '12 at 20:06

Not a complete answer, but the wikipedia page you cited states:

The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance.

The likelihood of this occurring increases in the presence of high dimensional feature spaces.

Non-Parametric Methods Like K-Nearest-Neighbours in High Dimensional Feature Space

2 Answers2