Why is the trace of this matrix equal to the squared Frobenius norm?

Question

Suppose $A$ is a $d\times d$ matrix formed by multiplying $d$ random matrices of the form $I-vv^T$ where $v$ is sampled isotropically from the unit sphere.

It appears that various various values of $d$, $\operatorname{Tr}(A)$ and $\|A\|_F^2$ are almost equal, how can we explain this fact?

Background:

spectrum of this matrix is visualized here, this matrix is highly non-normal
singular values also appear to work to compute higher spectral moments, discussed here.

Notebook

@whpowell96 I can show that expected trace is equal to $(1-1/d)^d$, but I don't understand why squared Frobenius norm also has this value — Yaroslav Bulatov, Dec 26 '24 at 00:44
I see. This is a very interesting problem! Could you try to visualize the densities of $Re(\lambda_i)$ and $\sigma_i^2$? The results seem to indicate that they have the same mean as $d\to\infty$, but it may be that the densities share more similarities asymptotically — whpowell96, Dec 26 '24 at 02:14
@whpowell96 I visualize absolute values of eigenvalues and squared singular values in https://math.stackexchange.com/questions/5015510/when-are-spectral-moments-equal-to-sums-of-powers-of-singular-values — Yaroslav Bulatov, Dec 26 '24 at 02:20
Densities seem different, ie the matrix has negative eigenvalues, so it needs to have larger positive eigenvalues to balance the sum out, since square singular values are only positive — Yaroslav Bulatov, Dec 26 '24 at 02:21
So there are two distinct distributions with the same mean. That is odd. When I get some time I will play around with this construction to see what sort of tweaks break this equivalence — whpowell96, Dec 26 '24 at 02:30
@Jap88 individual projection matrices don't commute so their product is not a projection — Yaroslav Bulatov, Dec 26 '24 at 04:46

user1551 · Accepted Answer · 2024-12-27T14:11:59.693

It is easy to show that the $\operatorname{tr}(A)$ and $\|A\|_F^2=\operatorname{tr}(A'A)$ have the same mean $d(1-\frac{1}{d})^d$. The hard part, which I am not able to answer here, is whether (or why) the distributions of $\operatorname{tr}(A)$ and $\|A\|_F^2$ are highly concentrated about their common mean.

Let $P_i=I-v_iv_i'$ where $v_1,v_2,\ldots,v_d$ be independently and uniformly sampled from the unit sphere. Consider any matrix product of the form $P_{j_1}P_{j_2}\cdots P_{j_k}$. For every constant unitary matrix $Q$, since $Qv_1$ has the same distribution as $v_i$, we have $$ E(P_{j_1}P_{j_2}\cdots P_{j_k}) =E\left((QP_{j_1}Q')(QP_{j_2}Q')\cdots (Q'P_{j_k}Q)\right) =QE(P_{j_1}P_{j_2}\cdots P_{j_k})Q'. $$ That is, $E(P_{j_1}P_{j_2}\cdots P_{j_k})$ is invariant under every unitary similarity transform. Hence it must be a scalar multiple of $I$.

In particular, as $E(P_i)=E(I-v_iv_i')$ has trace $d-1$, it must be equal to $cI$ where $c=1-\frac{1}{d}$. Hence \begin{align*} E(A) &=E\left(P_1P_2\cdots P_d\right)\\ &=E(P_1)E(P_2)\cdots E(P_d)\\ &=c^dI,\\ E\left(AA'\right) &=E\left(P_1P_2\cdots P_{d-1}P_dP_dP_{d-1}\cdots P_1\right)\\ &=E\left(P_1P_2\cdots P_{d-1}P_dP_{d-1}\cdots P_1\right)\\ &=E\left(P_1P_2\cdots P_{d-1}E(P_d)P_{d-1}\cdots P_1\right)\\ &=E\left(P_1P_2\cdots P_{d-1}(cI)P_{d-1}\cdots P_2P_1\right)\\ &=cE\left(P_1P_2\cdots P_{d-1}P_{d-1}\cdots P_2P_1\right)\\ &=\cdots\\ &=c^dE(I)=c^dI. \end{align*} It follows that $E(\operatorname{tr}(A))=dc^d=E(\operatorname{tr}(AA'))=E(\|A\|_F^2)$

In the linked posts, there are some visualizations of the eigenvalue and singular value distributions for a matrix with $d=40000$ and they appear to only concentrate about 0 and 1, not the mean. — whpowell96, Dec 26 '24 at 16:09
With 'slightly more intensive' computations, one can write the expression of Var(tr(A)) out and show the concentration (Var(tr(A) / d) -> 0) — Xiangxiang Xu, Dec 29 '24 at 02:40
@XiangxiangXu That’s great. Would you like to write it out in an answer? — user1551, Dec 29 '24 at 08:47
@user1551 I wish it were some stupid error, but it seems my Var(tr(A)) converges to a constant (without being scaled by $d$)... That seems to be a very strong concentration (see below). — Xiangxiang Xu, Dec 29 '24 at 17:48
@user1551 btw, a follow-up question in case it is of interest https://mathoverflow.net/questions/485413/trace-inequality-involving-random-projections — Yaroslav Bulatov, Jan 06 '25 at 06:29

Xiangxiang Xu · Answer 2 · 2024-12-29T21:18:24.543

We can abstract each update as $$B = (I - vv^T) A,\tag{*}$$ where $v$ is chosen from the unit hypersphere independent of the choices of $A$. Our derivation will use the following facts: $$ \begin{align} \mathbb{E}[vv^T] = \frac{1}{d} \cdot I, \qquad \mathbb{E}[(v^TMv)^2] = \frac{1}{d(d+2)} \left[\operatorname{tr}^2(M) + \operatorname{tr}(M^TM)+ \operatorname{tr}(M^2)\right] \end{align}. $$

From (*) we get $$ \begin{align} \operatorname{tr}(B) &= \operatorname{tr}(A) - v^TAv \end{align} $$ and $$\operatorname{tr}^2(B) = \operatorname{tr}^2(A) - 2 \operatorname{tr}(A) \cdot v^TAv + (v^TAv)^2.$$

Taking the conditional expectation, we have $$ \mathbb{E}[\operatorname{tr}^2(B)|A] = \left(1-\frac{2}{d}\right)\operatorname{tr}^2(A) + \frac{1}{d(d+2)}\left[\operatorname{tr}^2(A) + \operatorname{tr}(A^TA)+ \operatorname{tr}(A^2)\right].\tag{1} $$ Similarly, as $B^2 = (I - vv^T) A(I - vv^T) A = A^2 - vv^T A^2 - Avv^TA + vv^TAvv^TA$, we have $$\operatorname{tr}(B^2) = \operatorname{tr}(A^2) - 2v^TA^2v + (v^TAv)^2$$ and $$\mathbb{E}[\operatorname{tr}(B^2)|A] = \left(1-\frac{2}{d}\right)\operatorname{tr}(A^2) + \frac{1}{d(d+2)} \left[\operatorname{tr}^2(A) + \operatorname{tr}(A^TA)+ \operatorname{tr}(A^2)\right].\tag{2}$$ Also, since $B^TB = A^T(I-vv^T)A = A^TA - A^Tvv^TA$, we have $$\mathbb{E}[\operatorname{tr}(B^TB)|A] = \left(1-\frac{1}{d}\right) \operatorname{tr}(A^TA) .\tag{3}$$

Taking another expectation of (1)-(3), we get $$ \begin{bmatrix} \mathbb{E}[\operatorname{tr}^2(B)]\\ \mathbb{E}[\operatorname{tr}(B^2)]\\ \mathbb{E}[\operatorname{tr}(B^TB)] \end{bmatrix} = C \begin{bmatrix} \mathbb{E}[\operatorname{tr}^2(A)]\\ \mathbb{E}[\operatorname{tr}(A^2)]\\ \mathbb{E}[\operatorname{tr}(A^TA)] \end{bmatrix},\qquad C = \begin{bmatrix} 1-\frac{2}{d} + \frac{1}{d(d+2)} & \frac{1}{d(d+2)} & \frac{1}{d(d+2)}\\ \frac{1}{d(d+2)} & 1-\frac{2}{d} + \frac{1}{d(d+2)} & \frac{1}{d(d+2)}\\ 0 & 0 & 1-\frac{1}{d} \end{bmatrix}. $$

The matrix $C$ has eigenvalues $\lambda_1 = 1-\frac{2}{d}+ \frac{2}{d(d+2)}, \lambda_2 = 1-\frac{2}{d}, \lambda_3 = 1-\frac{1}{d}$, with corresponding eigenvectors $(1, 1, 0)^T, (1, -1, 0)^T, (1, 1, d)^T$.

The initial condition is $A = I$, and $\begin{bmatrix} \mathbb{E}[\operatorname{tr}^2(A)]\\ \mathbb{E}[\operatorname{tr}(A^2)]\\ \mathbb{E}[\operatorname{tr}(A^TA)] \end{bmatrix} = \begin{bmatrix}d^2\\ d\\ d\end{bmatrix} = \frac{d^2 + d - 2}{2} \begin{bmatrix} 1 \\ 1 \\ 0\end{bmatrix} + \frac{d^2 - d}{2} \begin{bmatrix} 1 \\ -1 \\ 0\end{bmatrix} + \begin{bmatrix} 1 \\ 1 \\ d\end{bmatrix} $.

Therefore, after iterating $d$-steps, we obtain the result $$\begin{align} \mathbb{E}[\operatorname{tr}^2(A)] &= \frac{d^2 + d - 2}{2} \cdot \lambda_1^d + \frac{d^2 - d}{2} \cdot \lambda_2^d + \lambda_3^d\\ &= \frac{d^2 + d - 2}{2} \cdot \left(1-\frac{2}{d}+ \frac{2}{d(d+2)}\right)^d + \frac{d^2 - d}{2} \cdot \left(1-\frac{2}{d}\right)^d + \left(1-\frac{1}{d}\right)^d.\end{align} $$ As $\mathbb{E}[\operatorname{tr}(A)] = \left(1-\frac{1}{d}\right)^d d$, we have $$ \begin{align} \operatorname{var}(\operatorname{tr}(A)) &= \mathbb{E}[\operatorname{tr}^2(A)] - (\mathbb{E}[\operatorname{tr}(A)])^2 \\ &= \frac{d^2 + d - 2}{2} \cdot \left(1-\frac{2}{d}+ \frac{2}{d(d+2)}\right)^d + \frac{d^2 - d}{2} \cdot \left(1-\frac{2}{d}\right)^d + \left(1-\frac{1}{d}\right)^d - \left(1-\frac{1}{d}\right)^{2d} d^2. \end{align} $$ which has a constant limit $\frac{-3 + 2 e}{2 e^2}$from Wolfram Alpha.

Remarks:

This approach can be extended to analyze the variance of $\operatorname{var}(\|A\|_F^2) = \operatorname{var}(\operatorname{tr}(A^TA))$, which converges to $1/e^2$ (see below);
As a byproduct, we also get

$$ \begin{align}\mathbb{E}[\operatorname{tr}(A^2)] &= \frac{d^2 + d - 2}{2} \left(1-\frac{2}{d}+ \frac{2}{d(d+2)}\right)^d - \frac{d^2 - d}{2}\left(1-\frac{2}{d}\right)^d + \left(1-\frac{1}{d}\right)^d\\ &= \frac{2d}{e^2} + \frac{e - 3}{e^2} + O\left(\frac1d\right) \end{align} \tag{P1} $$ (asymptotic from Wolfram Alpha)

On $\operatorname{var}(\operatorname{tr}(A^TA))$:

Since $$B^TB = A^T(I-vv^T)A = A^TA - A^Tvv^TA,\tag{#}$$ we have $ \operatorname{tr}(B^TB) = \operatorname{tr}(A^TA) - v^TAA^Tv$, and $\operatorname{tr}^2(B^TB) = \operatorname{tr}^2(A^TA) - 2 \operatorname{tr}(A^TA) \cdot v^TAA^Tv + (v^TAA^Tv)^2. $ Therefore,

$$ \mathbb{E}[\operatorname{tr}^2(B^TB)|A] = \left(1 - \frac{2}{d}\right)\operatorname{tr}^2(A^TA) + \frac{1}{d(d+2)} \left[\operatorname{tr}^2(A^TA) + 2\operatorname{tr}((A^TA)^2)\right]. $$

Similarly, from (#), we obtain $$\begin{align}(B^TB)^2 &= (A^TA - A^Tvv^TA)(A^TA - A^Tvv^TA) \\ &= (A^TA)^2 - A^TAA^Tvv^TA - A^Tvv^TAA^TA + (A^Tvv^TA)^2 \end{align}$$ and thus

$$\mathbb{E}[\operatorname{tr}((B^TB)^2)|A] = \left(1 - \frac{2}{d}\right)\operatorname{tr}((A^TA)^2) + \frac{1}{d(d+2)} \left[\operatorname{tr}^2(A^TA) + 2\operatorname{tr}((A^TA)^2)\right]. $$

Taking another expectation, we obtain

$$ \begin{bmatrix} \mathbb{E}[\operatorname{tr}^2(B^TB)]\\ \mathbb{E}[\operatorname{tr}((B^TB)^2)] \end{bmatrix} = C' \begin{bmatrix} \mathbb{E}[\operatorname{tr}^2(A^TA)]\\ \mathbb{E}[\operatorname{tr}((A^TA)^2)] \end{bmatrix},\qquad C' = \begin{bmatrix} 1-\frac{2}{d} + \frac{1}{d(d+2)} & \frac{2}{d(d+2)} \\ \frac{1}{d(d+2)} & 1-\frac{2}{d} + \frac{2}{d(d+2)} \end{bmatrix}. $$

The matrix $C'$ has eigenvalues $\lambda_1 = 1-\frac{2}{d} + \frac{3}{d(d+2)}, \lambda_2 = 1-\frac{2}{d}$, and corresponding eigenvectors $(1, 1)^T$ and $(2, -1)^T$.

The initial condition is $ A = I$ and $\begin{bmatrix} \mathbb{E}[\operatorname{tr}^2(A^TA)]\\ \mathbb{E}[\operatorname{tr}((A^TA)^2)] \end{bmatrix} = \begin{bmatrix} d^2\\d \end{bmatrix} = \frac{d^2+2d}{3} \begin{bmatrix} 1\\1\end{bmatrix} + \frac{d^2-d}{3} \begin{bmatrix} 2\\-1\end{bmatrix} $.

Therefore, after iterating $d$-steps, we obtain the result $$\begin{align} \mathbb{E}[\operatorname{tr}^2(A^TA)] &= \frac{d^2+2d}{3} \left(1-\frac{2}{d} + \frac{3}{d(d+2)}\right)^d + 2 \cdot \frac{d^2-d}{3} \left(1-\frac{2}{d}\right)^d .\end{align} $$ As $\mathbb{E}[\operatorname{tr}(A^TA)] = \left(1-\frac{1}{d}\right)^d d$, we have $$ \begin{align} \operatorname{var}(\operatorname{tr}(A^TA)) &= \mathbb{E}[\operatorname{tr}^2(A^TA)] - (\mathbb{E}[\operatorname{tr}(A^TA)])^2 \\ &= \frac{d^2+2d}{3} \left(1-\frac{2}{d} + \frac{3}{d(d+2)}\right)^d + 2 \cdot \frac{d^2-d}{3} \left(1-\frac{2}{d}\right)^d - \left(1-\frac{1}{d}\right)^{2d} d^2. \end{align} $$ which has a constant limit $1/e^2$ from Wolfram Alpha.

We also get $$ \begin{align} \mathbb{E}[\operatorname{tr}((A^TA)^2)] &= \frac{d^2+2d}{3} \left(1-\frac{2}{d} + \frac{3}{d(d+2)}\right)^d - \frac{d^2-d}{3} \left(1-\frac{2}{d}\right)^d\\ &= \frac{2d}{e^2} -\frac{1}{2e^2} + O\left(\frac1d\right) ,\end{align}\tag{P2} $$ (asymptotic from Wolfram Alpha)

We can compare (P2) with (P1) and show their expectations have the same linear term. This is related to the case $p = 2$ discussed in this post "Why is $\operatorname{Tr}(A^p)\approx \sum_i \sigma_i^{2p}$ for this random matrix?".

Thanks for your work. It’s very nice! By the way, how do you obtain the formula $E[(v^TMv)^2] = \frac{1}{d(d+2)} \left[\operatorname{tr}^2(M) + \operatorname{tr}(M^TM)+ \operatorname{tr}(M^2)\right]$. I can use $E[v^Tvv^Tv]=1$ and the invariance of $E[(v^TMv)^2]$ under orthogonal similarity to set up two linear equations in $a=E[v_1^2]$ and $b=E[v_1^2v_2^2]$ and obtain the formula without difficulty, but I have not seen this formula before. Is it well-known in the literature? Any reference? — user1551, Dec 29 '24 at 20:19
impressive; it may be helpful to note that the $v$'s have the same distribution as the matrix elements in a single row (or single column) of a $d\times d$ orthogonal random matrix with Haar measure, and in that context the formula is well known, in the form $\mathbb{E}[[v_i v_j v_k v_l] = \frac{1}{d(d+2)}[\delta_{ij} \delta_{kl} + \delta_{ik} \delta_{jl} + \delta_{il} \delta_{jk}]$. — Carlo Beenakker, Dec 29 '24 at 21:01
more general formulas of this type follow from Weingarten calculus — Carlo Beenakker, Dec 29 '24 at 21:14
@user1551 It can be done by applying the result of $\mathbb{E}[v_iv_jv_kv_l]$ (as pointed out in the above cmts), and the only non-trivial cases are $\mathbb{E}[v_1^4]$ and $\mathbb{E}[v_1^2v_2^2]$. — Xiangxiang Xu, Dec 29 '24 at 22:18
the generalization to higher powers seems to be of the form $\mathbb{E}[\operatorname{tr}A^p]=\frac{dp^{p-1}}{(p-1)!e^p}+{\cal O}(d^0)$; I still need to check that this also agrees with $\mathbb{E}[\operatorname{tr}(A^\top A)^p]$. — Carlo Beenakker, Dec 30 '24 at 17:18
@CarloBeenakker Your formula well matches the one plotted in the other post in the case $d = 100$. The $(p-1)!$ could be extended to $\Gamma(p)$. — Xiangxiang Xu, Dec 30 '24 at 17:27
I've tested numerically for $d=3$ and see close agreement with P1,P2 formulas. There's a slight difference between P1 and P2, which is clear empirically if we run for more than 100k samples: https://www.wolframcloud.com/obj/yaroslavvb/newton/forum-sval-eval-compare.nb — Yaroslav Bulatov, Jan 05 '25 at 00:01

Why is the trace of this matrix equal to the squared Frobenius norm?

2 Answers2

Linked