Highest Voted 'distance' Questions - Data Science Stack Exchange

46

votes

6 answers

When would one use Manhattan distance as opposed to Euclidean distance?

I am trying to look for a good argument on why one would use the Manhattan distance over the Euclidean distance in machine learning. The closest thing I found to a good argument so far is on this MIT lecture. At 36:15 you can see on the slides the…

machine-learning classification distance

asked Jun 30 '17 at 06:28

Bitcoin Cash - ADA enthusiast

639
1
7
12

29

votes

1 answer

What is Hellinger Distance and when to use it?

I am interested in knowing what really happens in Hellinger Distance (in simple terms). Furthermore, I am also interested in knowing what are types of problems that we can use Hellinger Distance? What are the benefits of using Hellinger Distance?

machine-learning data-mining text-mining distance

asked Aug 31 '17 at 02:11

Smith Volka

685
2
6
13

12

votes

1 answer

Finding linear transformation under which distance matrices are similar

I have $n$ sets of vectors, where each set $S_i$ contains $k$ vectors in $\mathbb{R}^d$. I know there is some unknown linear transformation $W$ under which the distance matrix $D_i$ (a $k\times k$ matrix) is approximately "the same" (i.e. has a low…

machine-learning metric distance linear-algebra mathematics

asked Jul 22 '19 at 14:24

user1767774

33
6

9

votes

1 answer

Why is the cosine distance used to measure the similatiry between word embeddings?

While computing the similarity between the words, cosine similarity or distance is computed on word vectors. Why aren't other distance metrics such as Euclidean distance suitable for this task. Let us consider 2 vectors a and b. Where, a = [-1,2,-3]…

word-embeddings distance cosine-distance

asked Sep 03 '20 at 12:45

Ashwin Geet D'Sa

1,217
2
11
20

8

votes

1 answer

Cosine Distance > 1 in scipy

I am working on a recommendation engine, and I have chosen to use SciPy's cosine distance as a way of comparing items. I have two vectors: a = [2.7654870801855078, 0.35995355443076027, 0.016221679989074141, -0.012664358453398751,…

python distance cosine-distance

asked Oct 13 '15 at 22:23

redgem

183
1
1
4

8

votes

2 answers

Fixing data inconsistencies

I'm trying to analyze some data I have but there is a lot of inconsistencies in my data. I have a SQL table that I'm trying to analyze. The table is a table of universities with the following structure: name:string, city:string, state:string,…

data-cleaning similarity distance

asked Apr 07 '16 at 22:38

bl0b

183
4

8

votes

3 answers

Levenshtein distance vs simple for loop

I have recently begun studying different data science principles, and have had a particular interest as of late in fuzzy matching. For preface, I'd like to include smarter fuzzy searching in a proprietary language named "4D" in my workplace, so…

distance efficiency fuzzy-logic

asked Feb 19 '22 at 14:32

Jadon Steinmetz

83
5

7

votes

2 answers

Coordinate System's influence on $L$ distances (Manhattan and Euclidean)

I don't understand this picture, which says if we change the coordinate system, we would have the same result for $L_2$ distance, whereas, our result would differ for $L_1$ distance. What does it mean by coordinate system? $(0,0)$ if yes, the…

distance k-nn manhattan

asked May 24 '19 at 00:01

Fatemeh Asgarinejad

1,198
1
10
18

7

votes

5 answers

Is there a way to measure correlation between two similar datasets?

Let's say that I have two similar datasets with the same size of elements, for example 3D points : Dataset A : { (1,2,3), (2,3,4), (4,2,1) } Dataset B : { (2,1,3), (2,4,6), (8,2,3) } And the question is that is there a way to measure the…

correlation similarity distance

asked Feb 28 '17 at 15:25

xtluo

233
1
3
11

7

votes

4 answers

What methods exist for distance calculation in clustering? when we should use each of them?

What methods exist for distance calculation in clustering? like Manhattan, Euclidean, etc.? Plus, I don't know when I should use them. I always use Euclidean distance.

clustering distance

asked Apr 11 '16 at 05:05

parvij

791
5
18

6

votes

2 answers

How do I test a difference between two proportions representing fatality rate for Covid 19 in Philippines and World (except Philippines)?

I'm trying to analyse if the fatality rate from my country (A third world country) vary significantly from the world's fatality rate. So I'd basically have two samples, labeled (Philippines) and (World excluding the Philippines) then i can compute…

statistics distance self-study

asked Mar 14 '20 at 02:49

Benj Cabalona Jr.

261
1
10

6

votes

3 answers

Clustering algorithm for a distance matrix

I have a similarity matrix between N objects. For each N objects, I have a measure of how similar they are between each others - 0 being identical (the main diagonal) and increasing values as they get less and less similar. Something like that…

python clustering distance

asked Mar 03 '19 at 17:46

Francis Vachon

63
1
4

6

votes

2 answers

Alternative distance to Dynamic Time Warping

I am performing a comparison among time series by using Dynamic Time Warping (DTW). However, it is not a real distance, but a distance-like quantity, since it doesn't assure the triangle inequality to hold. Reminder:d:MxM->R is a distance if for all…

machine-learning time-series distance

asked Mar 26 '18 at 10:11

Ripstein

208
2
12

6

votes

1 answer

Can I use euclidean distance for Latent Dirichlet Allocation document similarity?

I have a Latent Dirichlet Allocation (LDA) model with $K$ topics trained on a corpus with $M$ documents. Due to my hyper parameter configurations, the output topic distributions for each document is heavily distributed on only 3-6 topics and all the…

nlp lda distance similar-documents

asked Nov 17 '17 at 12:04

PyRsquared

1,666
1
12
18

6

votes

1 answer

Improve k-means accuracy

Our weapons: I am experimenting with k-means and Hadoop, where I am chained to these options for various reasons (e.g. Help me win this war!). The battlefield: I have articles, which belong to c categories, where c is fixed. I am vectorizing the…

python text-mining apache-hadoop k-means distance

asked Feb 02 '16 at 01:42

gsamaras

291
6
15

Questions tagged [distance]