13

I am using below code to compute cosine similarity between the 2 vectors. It returns a matrix instead of a single value 0.8660254.

[[ 1. 0.8660254]

[ 0.8660254 1. ]]

from sklearn.metrics.pairwise import cosine_similarity
vec1 = [1,1,0,1,1]
vec2 = [0,1,0,1,1]
print(cosine_similarity([vec1, vec2]))
Green Falcon
  • 14,308
  • 10
  • 59
  • 98
Olivia Brown
  • 233
  • 1
  • 2
  • 4

2 Answers2

15

Based on the documentation cosine_similarity(X, Y=None, dense_output=True) returns an array with shape (n_samples_X, n_samples_Y). Your mistake is that you are passing [vec1, vec2] as the first input to the method. Also your vectors should be numpy arrays:

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
vec1 = np.array([[1,1,0,1,1]])
vec2 = np.array([[0,1,0,1,1]])
#print(cosine_similarity([vec1, vec2]))
print(cosine_similarity(vec1, vec2))

X : ndarray or sparse array, shape: (n_samples_X, n_features) Input data.

So you have to specify the dimension.

np.array([1, 2]).shape

has funny shape:

(2, )

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
0
vec1 = [1, 1, 0, 1, 1]
vec2 = [0, 1, 0, 1, 1]
print(cosine_similarity([vec1], [vec2]))

I passed the 2nd vec2 as Y and I got the output as a scalar.

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34
mamuni
  • 1