6

I am performing a comparison among time series by using Dynamic Time Warping (DTW). However, it is not a real distance, but a distance-like quantity, since it doesn't assure the triangle inequality to hold.

Reminder:d:MxM->R is a distance if for all x,y in M:

1 - d(x,y) ≥ 0, and d(x,y) = 0 if and only if x = y
2 - It is symmetric: d(x,y) = d(y,x)
3 - Triangle inequality: d(x,z) ≤ d(x,y) + d(y,z)

There is any equivalent measure that ensures the condition of distance in a matemathical sense? Obviously, I am not looking for a Euclidean distance, but one that ensures the proper classification of my series in a future clustering. If so, there is any solid implementation in a R or Python package?

Ripstein
  • 208
  • 2
  • 12

2 Answers2

3

Like suggested in one answer on this SO question, you could use elastic matching with Levenshtein distance to your task. Levenshtein distance obeys triangle inequality and is therefore a metric distance.

Use of elastic matching was suggested for time series data comparison. Levenshtein distance works with characters data.

There is an implementation of elastic matching and Levenshtein distance calculation in Python.

To put them together you most probably need to build your own implementation.

mico
  • 569
  • 1
  • 5
  • 15
-1

When you say "However, it is not a real distance, but a distance-like quantity," you really mean, it is a measure, not a metric.

Why do you think you need a metric?

Consider the following common American girl names:

[Lisabeth, Beth, Lisa, Maryanne, Anne, Mary]

If asked to cluster these names into two groups, we would surely expect [{Lisabeth, Lisa, Beth}, {Maryanne, Mary, Anne}].

However, no distance measure that insists on the triangular inequality would give us “Beth” and “Lisa” in the same group; since they do not share a single character with each other. Yet, both share one character with “Anne”.

There is a tutorial on DTW here http://www.cs.unm.edu/~mueen/DTW.pdf