1

The question is whether we can find a correlation between two sets of grades (categorical data).

Let’s say we have a dog competition and there are 1000 dogs participating.

There are two rounds of assessment

first round dog owners give their assessment on the scale from A to C. Where A is excellent and C is bad. There are four criteria for assessment during both tours (behaviour etc).

second round one judge gives his assessment of one dog based on the same assessment criteria as in round 1. however, grades vary from M - meeting expectation, E - exceeding expectation, B - Bellow expectation.

We understand that M is B, E is A and B is C.

After two rounds our table would look like:


| dog             | round one | round two |
| --------------- | --------- | --------- |
| Dog1_criteria1  | A         | B         |
| Dog1_criteria2  | A         | E         |
| Dog1_criteria3  | A         | E         |
| Dog1_criteria4  | B         | M         |
| Dog2_criteria1  | A         | E         |
| Dog2_criteria2  | B         | M         |
| Dog2_criteria3  | A         | E         |
| Dog2_criteria4  | C         | B         |
....

How do we find a correlation between the two sets of answers? Thank you!

JohnM
  • 103
  • 5

1 Answers1

0

You can treat this as a type of inter-rater agreement problem and use Cohen's Weighted Kappa or a similar measurement. Weighted kappa takes into account the distribution of ratings for each round and the difference between grades.

Three matrices are involved: the matrix of observed scores, the matrix of expected scores based on chance agreement, and the weight matrix. Weight matrix cells located on the diagonal (upper-left to bottom-right) represent agreement and thus contain zeros. Off-diagonal cells contain weights indicating the level of separation between ratings. Often, cells one off the diagonal are weighted 1, those two off 2, etc. source:Wikipedia

The equation for weighted κ is:

$$\kappa = 1 - \frac{{\sum_{i=1}^k}{\sum_{j=1}^k}w_{ij}x_{ij}}{{\sum_{i=1}^k}{\sum_{j=1}^k}w_{ij}m_{ij}}$$

where k=number of codes and $w_{ij}, x_{ij}, m_{ij} $ are elements in the weight, observed, and expected matrices, respectively. When diagonal cells contain weights of 0 and all off-diagonal cells weights of 1, this formula produces the same value of kappa as the calculation given above.

In practice, you may want to use an implementation in Python, R, or a statistical software package rather than manual calculations.

Here is some intuition from a similar example on why to use weighted kappa (round 2 grades are converted to round 1 scale):

| dog             | round one | round two |
| --------------- | --------- | --------- |
| Dog1_criteria1  | A         | C         |
| Dog1_criteria2  | A         | B         |
| Dog1_criteria3  | A         | A         |
| Dog1_criteria4  | B         | B         |
| Dog2_criteria1  | A         | A         |
| Dog2_criteria2  | A         | A         |
| Dog2_criteria3  | A         | A         |
| Dog2_criteria4  | C         | C         |

You could look at % agreement and give a score of 6 of 8, or 0.75. That seems good, but suppose the judges just gave everyone an A. That would also be 0.75. So we need to factor in the frequency of the grades and the probability of agreement between each combination. That's where the expected matrix comes in.

Then there is the degree of agreement between two ratings. You usually want to assign more agreement to an A/B or B/C than to an A/C. And the differences may not be linear at all. The weight matrix allows you to account for the differences.

The observed matrix is simply the count of observations for each possible pair of ratings.

A final note: there are several variations of Kappa, such as quadratic weighted kappa. Most of them should work well for comparing grades.

JohnM
  • 103
  • 5