How to get the optimal value for $k$?
You have to define a measure for optimiality. The problem with that is that with bigger $k$ most measures become smaller (better). One measure which is independent from $k$ is the silhouette coefficient:
Let $C = (C_1, \dots, C_k)$ be the clusters. Then:
- Average distance between an object $o$ and other objects in the same cluster: $$a(o) = \frac{1}{|C(o)|} \sum_{p \in C(o)} dist(o, p)$$
- Average distance to the next cluster: $$b(o) = \min_{C_i \in \text{Cluster} \setminus C(o)}(\frac{1}{C_i}) \sum_{p\in C_i} \sum_{p \in C_i} \text{dist}(o, p)$$
- Silhouette of an object: $$s(o) = \begin{cases}0 &\text{if } a(o) = 0, \text{i.e. } |C_i|=1\\
\frac{b(o)-a(o)}{\max(a(o), b(o))} &\text{otherwise}\end{cases}$$
- Silhouette of a clustering $C$: $$\text{silh}(C) = \frac{1}{|C|} \sum_{C_i \in C} \frac{1}{|C_i|} \sum_{o \in C_i} s(o)$$
You can see that $s(o) \in [-1, 1]$ and $\text{silh}(C) \in [-1, 1]$. Higher is better. Smaller than 0 is very bad.
Now you can start with $k=1$ and increase $k$ until $\text{silh}(C)$ gets smaller again.
However, there are alternatives to $k$-means clustering: