0

I have the following code:

n - number of 1-d points (real numbers)
numbers[n] - array of numbers n
list, se_list - list variables to store cluster starting points
_____________

sort(numbers)

find_cluster(start, end, ci): \\ ci - cluster index
  best_se = +infinity \\ best squared eror
  list = {start}
  for i from start to end−ci+1:
    if ci > 1:
      (se, se_list) = find_cluster(i+1, end, ci-1)
    else:
      se = get_se(i+1, end) 
      se_list += {i+1} 
  new_se = get_se(start, i)
  if new_se + se < best_se:
    best_se = new_se + se
    list = list + se_list
  return (best_se, list)

\\ this computes variance of points from start to end
get_se(start, end):
  sum = 0
  for i from start to end:
    sum = sum + numbers[i]
  mean = sum/(end-start+1)
  se = 0
  for i from start_to_end:
    se = se + (numbers[i]−mean)*(numbers[i]−mean)
  return se

As far a my analysis goes, this algorithm take time O(n^(2k)), where k = ci, but I am not sure if I am correct. I first proved it to myself as

T(n) = n*(n + T(n-1)) = n^2 + n*(n-1)*((n-1) + T(n-2)) + ...

so it tends to n^k, but there are smaller arguments which are for sure smaller than n^k, so overall time is O(n^(2k)). But I am not very experienced so what is the running time of it?

Details: the original question for each I made this algorithm is "there are n points in 1 dimension (real numbers) which you need to cluster as k-means in polynomial time". K (i called it ci in this algorithm) is the number of clusters.

Details 2: By smaller arguments i meant that when we start opening parenthesis in T(n) = n^2 + n*(n-1)*((n-1) + T(n-2)) -> there will appear 1 argument n^k and a lot of smaller arguments. Though n^k is the largest power in this equation, if I take O(n^k) < O(n^2k), then this will also account for smaller arguments (just because), but i dont think that "just because" is a valid argument especially if a constant should do. But still the question holds O(n^2k).

NB. I need to check the links in comments, but if I understood this well enough there wouldn't be a need to ask this question right? So I do not think that referencing me to other materials unless they site exactly the same algorithm is right. So I am asking someone with good understanding of algorithms either verify my claim (better with small proof) or just say i'm plain wrong and possibly provide an explanation why.

Tom
  • 185
  • 5

0 Answers0