I need to check the similarity between a set of position frequency matrices (finally to see if there is significant difference between 2 groups of 8 matrices).
Simplified example of two matrices are below (in fact I have 250x3 matrices). The values are relative frequencies of categories 1..5 in each column, i.e. sum of each column equals 1. The corresponding columns in the matrices can vary in magnitude, as between A2 and B2 columns, or in distribution, as in A3 and B3 column pair. The most dissimilar column pair is A1 and B1.
| A1 A2 A3 | B1 B2 B3
-------------------------------------------------------
1 | 0,00 0,20 0,20 | 1,00 0,15 0,00
2 | 0,00 0,50 0,50 | 0,00 0,60 0,10
3 | 0,00 0,20 0,20 | 0,00 0,15 0,20
4 | 0,00 0,10 0,10 | 0,00 0,10 0,50
5 | 1,00 0,00 0,00 | 0,00 0,00 0,20
What would be the best measure of (dis)similarity in this case?
Some possibilities I have found:
Compute euclidean distance between each column pair, and convert it to similarity. (like in http://rsat.sb-roscoff.fr/help.compare-matrices.html#_dis_similarity_metrics )
Would Pearson correlation coefficient be better suited for this than euclidean distance? (like in https://academic.oup.com/bioinformatics/article/21/3/307/237585 )
My thinking may be completely wrong, as my knowledge of this area is very limited, so any suggestions would be much appreciated, even completely overturning my approach.
There is an answer in Distance/Similarity between two matrices , but it is very general, and I hope that for position frequency matrix there should be something more specific.
----
Sorry for cross-posting from the statistical forum ("Cross validated"), but this forum has apparently more users, and a similar question on "Cross validated" has no answer for a long time: https://stats.stackexchange.com/questions/264183/looking-for-measures-of-similarity-for-two-matrices-of-pairwise-similarities-d , so I redirected my hope here.