Runtime of the optimal greedy $2$-approximation algorithm for the $k$-clustering problem

Question

We are given a set 2-dimensional points $|P| = n$ and an integer $k$. We must find a collection of $k$ circles that enclose all the $n$ points such that the radius of the largest circle is as small as possible. In other words, we must find a set $C = \{ c_1,c_2,\ldots,c_k\}$ of $k$ center points such that the cost function $\text{cost}(C) = \max_i \min_j D(p_i, c_j)$ is minimized. Here, $D$ denotes the Euclidean distance between an input point $p_i$ and a center point $c_j$. Each point assigns itself to the closest cluster center grouping the vertices into $k$ different clusters.

The problem is known as the (discrete) $k$-clustering problem and it is $\text{NP}$-hard. It can be shown with a reduction from the $\text{NP}$-complete dominating set problem that if there exists a $\rho$-approximation algorithm for the problem with $\rho < 2$ then $\text{P} = \text{NP}$.

The optimal $2$-approximation algorithm is very simple and intuitive. One first picks a point $p \in P$ arbitrarily and puts it in the set $C$ of cluster centers. Then one picks the next cluster center such that is as far away as possible from all the other cluster centers. So while $|C| < k$, we repeatedly find a point $j \in P$ for which the distance $D(j,C)$ is maximized and add it to $C$. Once $|C| = k$ we are done.

It is not hard to see that the optimal greedy algorithm runs in $O(nk)$ time. This raises a question: can we achieve $o(nk)$ time? How much better can we do?

Juho · Accepted Answer · 2012-05-23T23:05:25.433

The problem can be indeed viewed geometrically in a way that we would like to cover the $V$ points by $k$ balls, where the radius of the largest ball is minimized.

$O(nk)$ is indeed quite simple to achieve but one can do better. Feder and Greene, Optimal algorithms for approximate clustering, 1988 achieve a running time of $\Theta(n \log k)$ using more clever data structures and further show that this is optimal in the algebraic decision tree model.

score 1 · Answer 2 · answered Apr 26 '12 at 12:38

My question: Is there a way I can make the greedy picking strategy run in $o(|V|^2)$ time?

It seems to me that you described it. In case I read too far in your description, here is what I've understood. Have an associative data structure associating each element of $V$ with the sum of the distance to elements of $S$. This data structure can be initialized at a cost $O(|V|)$ with the distance to $p$ and this initialization can produce the next element as a side effect without increasing the complexity. It can be updated after the selection of a new element at a cost of $O(|V|)$, again producing the next element as a side effect. Repeat to get $S$. The resulting complexity is $O(k |V|)$.

Runtime of the optimal greedy $2$-approximation algorithm for the $k$-clustering problem

2 Answers2