2

Suppose I'm given $n$ points $x_1,\dots,x_n$ in some space $\mathcal{S}$ (think: $\mathbb{R}^d$), and probabilities $p_1,\dots,p_n$ that form a probability distribution (so $p_1 + \dots + p_n=1$). Imagine I have a source that outputs a point by choosing among the $n$ points according to the probability distribution $p_1,\dots,p_n$. Also, I have a distance measure $D(\cdot,\cdot)$ so that $D(x,y)$ is the dissimilarity between two points $x,y$.

I want to design a codebook $y_1,\dots,y_m \in \mathcal{S}$ that provides low-distortion encoding for this source, where $m$ is given and $m<n$. The point $x_i$ will be mapped to the $y_j$ that is most similar to $x$; then I'll transmit $y_j$ instead of $x_i$. This incurs distortion $D(x_i,y_j)$. Let $f(\cdot)$ be the function that maps $i$ to $j$, i.e., $y_{f(i)}$ is the $y$-point that is nearest to $x_i$. The expected distortion of a codebook is $$\sum_{i=1}^n p_i \cdot D(x_i,y_{f(i)}).$$ Given the $x_i$'s, $p_i$'s, and $m$, I want to find a codebook $y_1,\dots,y_m$ whose distortion is as small as possible.

Equivalently, this can be phrased as a clustering problem. I'm given $m$, and I want to cluster the points $x_1,\dots,x_n$ into $m$ clusters, with one centroid for each cluster, such that the expected distance from each point to the centroid of the cluster containing it is as small as possible. (This expected distance is computed by taking a weighted average over all the points $x_i$, weighted by the probabilities $p_i$.)

Are there any reasonable algorithms for this? This reminds me of clustering, but I've never seen clustering algorithms that take into account probabilities $p_1,\dots,p_n$. Alternatively, this reminds me of vector quantization, but the methods I've seen for vector quantization all focus on the $L_2$ distance metric (i.e., $D(x,y)=\|x-y\|_2$), and here I have a different distance measure -- in fact, it's not even a metric. Those methods don't seem to generalize in any clear way to an alternative distance measure. I do have $D(s,t) \ge 0$ and $D(s,t)=D(t,s)$ and $D(s,s)$ for all $s,t$.

D.W.
  • 167,959
  • 22
  • 232
  • 500

0 Answers0