You can treat this as a type of inter-rater agreement problem and use Cohen's Weighted Kappa or a similar measurement. Weighted kappa takes into account the distribution of ratings for each round and the difference between grades.
Three matrices are involved: the matrix of observed scores, the matrix of expected scores based on chance agreement, and the weight matrix. Weight matrix cells located on the diagonal (upper-left to bottom-right) represent agreement and thus contain zeros. Off-diagonal cells contain weights indicating the level of separation between ratings. Often, cells one off the diagonal are weighted 1, those two off 2, etc.
source:Wikipedia
The equation for weighted κ is:
$$\kappa = 1 - \frac{{\sum_{i=1}^k}{\sum_{j=1}^k}w_{ij}x_{ij}}{{\sum_{i=1}^k}{\sum_{j=1}^k}w_{ij}m_{ij}}$$
where k=number of codes and $w_{ij}, x_{ij}, m_{ij} $ are elements in the weight, observed, and expected matrices, respectively. When diagonal cells contain weights of 0 and all off-diagonal cells weights of 1, this formula produces the same value of kappa as the calculation given above.
In practice, you may want to use an implementation in Python, R, or a statistical software package rather than manual calculations.
Here is some intuition from a similar example on why to use weighted kappa (round 2 grades are converted to round 1 scale):
| dog | round one | round two |
| --------------- | --------- | --------- |
| Dog1_criteria1 | A | C |
| Dog1_criteria2 | A | B |
| Dog1_criteria3 | A | A |
| Dog1_criteria4 | B | B |
| Dog2_criteria1 | A | A |
| Dog2_criteria2 | A | A |
| Dog2_criteria3 | A | A |
| Dog2_criteria4 | C | C |
You could look at % agreement and give a score of 6 of 8, or 0.75. That seems good, but suppose the judges just gave everyone an A. That would also be 0.75. So we need to factor in the frequency of the grades and the probability of agreement between each combination. That's where the expected matrix comes in.
Then there is the degree of agreement between two ratings. You usually want to assign more agreement to an A/B or B/C than to an A/C. And the differences may not be linear at all. The weight matrix allows you to account for the differences.
The observed matrix is simply the count of observations for each possible pair of ratings.
A final note: there are several variations of Kappa, such as quadratic weighted kappa. Most of them should work well for comparing grades.