I am doing some reading (and implementation) of some Clustering Algorithms.
First I started with the well known K-Mean algorithm and implemented it directly from a paper. Got a kind of decent understanding of what I going on there.
What I am actually interested in is Particle Swarm Optimisation based clustering.
For Particle Swarm Optimisation, you need to have a global fitness function that will (in this case) tell you how well clusters this is.
The general notion of well clustered I have found is that each cluster is compact. But how to express this mathematically?
I thought that we would go though each cluster and check its positional variance (away from its centroid).
For some cluster C :
$$\mu(C)=\dfrac{1}{|C|}\sum_{\forall\tilde{x}\in C}\tilde{x}$$
$$\sigma^{2}(C)=\sum_{\forall\tilde{x}\in C}\left(\tilde{x}-\mu(C)\right)^{2}$$
No point converting to standard deviation, unless using a gradient based optimization approach, which PSO is not.
then the global fitness would be to minimize the average variance, which is proportional to minimizing the total variance since we have a fixed number of clusters.
if $S$ is the set of all clusters the we are to minimise some function $g$.
$g_{mine}(S)=\sum_{\forall C\in S}\sigma^{2}(C)$
ie $$g_{mine}(S)=\sum_{\forall C\in S}\sum_{\forall\tilde{x}\in C}\left(\tilde{x}-\mu(C)\right)^{2}$$
According to Chen, C.-Y. & Ye, F. Particle swarm optimization algorithm and its application to clustering analysis Networking, Sensing and Control, 2004 IEEE International Conference on, 2004, 2, 789-794
Equation 7 (Rewritten to use my notation)
$$g_{chen}(S)=\sum_{\forall C\in S}\sum_{\forall\tilde{x}\in X}(\tilde{x}-\mu(C)) $$
The difference from what I had is the $\forall\tilde{x}\in X$, for $X$ the whole spaceof data rather than $\forall\tilde{x}\in C$.
This is what got me wondering if I was correct in the first place. as it can be seen that $g_{mine}\ne g_{chen}$. It can't since the extra points are going to change it and will not cancel since all variances are positive. However they are equivalent if $\forall S, S'$ $g_{mine}(S)\ge g_{mine}(S') \iff g_{chen}(S)\ge g_{chen}(S')$
In: Ahmadyfard, A. & Modares, H. Combining PSO and k-means to enhance data clustering Telecommunications, 2008. IST 2008. International Symposium on, 2008, 688-691
Equation 6 has: (again with notation switched)
$$g_{ahmadfard}(S)=\dfrac{1}{|X|} \sum_{\forall C\in S}\sum_{\forall\tilde{x}\in C}\left(\tilde{x}-\mu(C)\right)^{2}$$
This is equivalent to mine (at least in proportionality) as: $|X| \times g_{ahmadfard}(S) = g_{mine}$, and since $|X|>0$ thus $\forall S, S'$ $g_{mine}(S)\ge g_{mine}(S') \iff g_{ahmadfard}(S)\ge g_{ahmadfard}(S')$
Ye and Chen also came up with another function in: Ye, Fun, and Ching-Yi Chen. "Alternative KPSO-clustering algorithm." Tamkang Journal of science and Engineering 8.2 (2005): 165. It is very complicated and beyond the scope of this question
- What are the pros and cons of $g_{chen}$ vs $g_{mine}$ vs $g_{ahmadfard}$ ?