4

I've searched quite a bit and haven't landed on any useful results.

The problem statement is: Given a set of vectors, I wish to find its approximate k-nearest neighbors. The caveat here is that each of my dimensions resemble a different entity and hence we cannot use the same weight for each dimension while computing the distance. Thus, solutions like kd-tree don't work as is.

Is there any data-structure or any alternate algorithm that I can use to find such approximate weighted k-nearest neighbors.

Note: Multiplying the initial input data with their weights so as to get a uniform weight is not an option.

2 Answers2

2

I strongly recommend using scaling as described above because it is faster than the manual method. If for some reason, scaling/preprocessing is unavailable, please use the metric parameter to pass a custom weighting function. See the example below.

import numpy as np

from sklearn.neighbors import KNeighborsClassifier as KNN

arr = np.random.randn(500, 10) # train X data y = np.random.randint(2, size=(500,)) # train y data

define custom weight function

weights = np.abs(np.random.randn(100)) # set up the desired weights def weighted_distance(sample_x, sample_y): global weights return np.sqrt(sum((w * w * x * x * y * y) for w, x, y in zip(weights, sample_x, sample_y)))

knn = KNN(n_neighbors=3, metric=weighted_distance) knn.fit(arr, y) test = np.random.randn(5,10) # validation or test data knn.predict(np.random.randn(5,10)) # predict ```

kate-melnykova
  • 548
  • 2
  • 11
1

As per @an6u5's comment:

If you want to weight one dimension higher than others then I suggest you standardize all of your data so that the mean is zero and the standard deviation is one. Then you can multiply the less important dimensions by a factor (2-10) so that they appear farther away to the KNN distance metric and leave the most important dimension un-scaled. Note that both standardizing and scaling are completely reversible processes, so there is very little reason not to use this simple solution

tomglynch
  • 111
  • 3