Consider a bivariate gaussian distribution, with parameters $\mu_1$ and $\mu_2$ for the two unknown means, and $\sigma_1$, $\sigma_2$ and $\rho$ for the known covariance matrix,
\begin{align} \Sigma&=\left(\begin{array}{cc} \sigma_1^2 & \sigma_{12} \\ \sigma_{12}& \sigma_2^2 \end{array}\right) = \left(\begin{array}{cc} \sigma_1^2 & \rho \sigma_{1} \sigma_{2} \\ \rho \sigma_{1} \sigma_{2}& \sigma_2^2 \end{array}\right) \end{align}
Assume we have $N$ samples, $X_1, \ldots, X_N$, each sample comprising the two features of the bivariate gaussian.
An estimator is better than another if its variance is smaller. We are only accounting for unbiased estimators. A pair of estimators for the means is best if the sum of their variances is the minimum achievable.
1) if the covariance matrix is unknown, the best we can do to estimate $\mu_1$ and $\mu_2$ is to let
$\hat{\mu}_1 = \sum_{i=1}^N X_{i,1} /N$ and $\hat{\mu}_2 = \sum_{i=1}^N X_{i,2} /N$ right?
2) if the covariance matrix is known, the best we can do is still the same? Isn't there any way to use information about the covariance matrix to improve the estimates of the means?
In particular, the trace of the inverse of the Fisher information matrix, which in this case equals $\Sigma$, is $(\sigma_1^2+\sigma_2^2)/N$ which suggests that it is impossible to use information about the covariance matrix.
This is very puzzling to me, though, specially in light of the fact that if we want to estimate a single mean (assuming that the other mean is known to be equal 0) from a bivariate gaussian we can leverage the correlation coefficient through the following estimator
\begin{equation} \overline{\mu}_1 = \frac{\sum X_{i,1}}{N} + \rho \frac{\sigma_1}{\sigma_2}\frac{\sum X_{i,2}}{N} \end{equation} see, e.g., page 4 of Sampling: regression methods on estimation
Why the correlation coefficient is so helpful when we want to estimate a single mean (assuming the other is known), and is useless when we want to estimate the two means (assuming none of the means are known)? Is there any intuition about this result?
This paper here also inspired my question
Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. The Annals of Mathematical Statistics, 3(3), 163-195.