2

Suppose $\Lambda$ is a finite-dimensional $n \times n$ real diagonal matrix with positive entries on the diagonal. I'm interested in the quantity $$ S = {\rm Tr}[U_1 \Lambda U_1^\dagger\, U_2 \Lambda U_2^\dagger\, \cdots\, U_m \Lambda U_m^\dagger] $$ for finite $m$, where each $U_i$ is an arbitrary $n \times n$ complex unitary matrix.

My question is: is it true that $\text{Re}(S) \leq {\rm Tr}[\Lambda^m]$, i.e. is it maximal when all $\{U_i\}$ are equal? If so, does anything change if $\Lambda$ is non-negative rather than positive?

I have been able to prove that this is true for $m = 2$ by using this answer and setting $D = \Lambda$ and $X = U_1^\dagger U_2 \Lambda U_2^\dagger U_1$. To me it seems intuitive that it would hold for all $m$, but I haven't been able to generalize. Any help would be much appreciated!

N.B. If it's easier, I could assume that $U_i$ are real orthogonal rather than complex unitary. But the more general case would be preferred.

user
  • 23
  • 4
  • Here's an alternative way to get the statement for $m=2$: Denote $A_j = U_j\Lambda U_j^\dagger$. Note that $A_j = A_j^\dagger$. Let $\langle A,B \rangle$ denote the Frobenius inner product, defined by $$ \langle A,B \rangle = \operatorname{tr}(A^\dagger B). $$ The Cauchy-Schwarz inequality tells us that $$ |\operatorname{tr}(AB)|^2 = |\langle A,B^\dagger \rangle|^2 \leq \langle A,A \rangle \cdot \langle B^\dagger,B^\dagger \rangle = \operatorname{tr}(A^\dagger A) \cdot \operatorname{tr}(B B^\dagger). $$ – Ben Grossmann Aug 03 '21 at 13:18
  • With that, we get the statement for $m = 2$ as follows: $$ |\operatorname{tr}(A_1A_2)|^2 \leq \operatorname{tr}(A_1^2)\cdot \operatorname{tr}(A_2^2) = \operatorname{tr}(\Lambda^2)^2. $$ – Ben Grossmann Aug 03 '21 at 13:18
  • Applying this inequality inductively gives us the general result $$ \operatorname{tr}(A_1 \cdots A_m) \leq \operatorname{tr}(\Lambda^2)^{m/2}. $$ This is not what you asked for, but maybe you will find it useful anyway. – Ben Grossmann Aug 03 '21 at 13:23
  • Thanks for the neat proof! The last inequality you give is incidentally also useful for me, will see if a similar strategy can be used to get the other inequality – user Aug 03 '21 at 13:43
  • $Re(S)$ is ambiguous but.... yes $\text{Re}\Big(\text{trace}\big(S\big)\Big)\leq \Big\vert\text{trace}\big(S\big)\Big\vert \leq \text{trace}\Big(\Lambda^m\Big)$. The result follows from majorization. The $m=2$ case is von-Neumann trace inequality but I don't know a non-majorization way for proving $m\geq 3$. – user8675309 Aug 03 '21 at 17:44
  • @user8675309 What exactly is the majorization argument in this case? – Ben Grossmann Aug 03 '21 at 17:49
  • @BenGrossmann with $\Sigma_{X}$ being a diagonal matrix containing the singular values of $X$ ordered from largest to smallest, the key result from the theory of majorization is that $\Sigma_{AB}\preceq_w \Sigma_{A}\Sigma_{B}$ (with $\preceq_w$ denoting weak majorization) and by careful induction we have $\Sigma_{S} \preceq_w \Sigma_{A^m}$ (what lurks in the background is exploiting the relationship between weak and strong majorization and that strong majorization implies Birkhoff-von Neumann and then use re-arrangement inequality). – user8675309 Aug 03 '21 at 18:29
  • @user8675309 Somehow that result didn't come to mind. Even with that, I don't see how exactly one would get from $\Sigma_{A^{m-1}}\Sigma_A$ to $\Sigma_{A^m}$; I assume that comes down to what "lurks in the background". I would be interested in seeing that proof if you find the time to put it together. – Ben Grossmann Aug 03 '21 at 19:03
  • @BenGrossmann I suspect that pushing the result through by mapping the weak majorization to strong log majorization will give a much nicer and shorter proof... let me revert – user8675309 Aug 03 '21 at 19:19
  • For a majorization-free proof: read https://terrytao.wordpress.com/2010/07/15/the-golden-thompson-inequality/ up to and including "Remark 1" . For purposes of this question non-commutative Hoelder for Hermitian matrices is plenty. – user8675309 Aug 04 '21 at 18:33
  • @user8675309 Nice find! I haven't gotten to read closely through your proof, but I'm looking forward to both – Ben Grossmann Aug 04 '21 at 20:32

1 Answers1

5

where $\Sigma_X$ is a diagonal matrix with the singular values for $X$ with ordering $\sigma_1\geq \sigma_2\geq...\geq \sigma_n$.
For purposes of OP's problem we only need to consider invertible matrices.

The desired inequality is:
$\text{real}\Big(\text{trace}\big(A_1A_2...A_m\big)\Big) \leq \Big\vert \text{trace}\big(A_1A_2...A_m\big)\Big\vert \leq \text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big) \leq \text{trace}\Big(\Sigma_{A_1}^m\Big)$
where the first inequality comes from the fact for that complex number $a+bi$ we have $a \leq \vert a\vert \leq \vert a\vert+\vert b\vert$, the second inequality is von Neumann Trace Inequality / using Polar Decomposition, and the final inequality is proven below.

core result:
a result due to Horn (1950), ref page 338 of Marshall, Olkin, and Arnold's "Inequalities..."

for invertible matrices $A$ and $B$
$\log\big(\Sigma_{AB}\big)\preceq \log\big(\Sigma_{A}\big)+\log\big(\Sigma_{B}\big)$
where $\preceq$ denotes (strong) majorization. Note the equality case-- why the majorization is strong not weak-- i.e. the fact that $\sum_{k=1}^n \log(\sigma_{AB}) = \sum_{k=1}^n \big(\log(\sigma_{A})+\log(\sigma_{B})\big)$ should be clear: just exponentiate each side and this is equivalent to $\det\Big(\Sigma_{AB}\Big) =\Big \vert \det\big(AB\big)\big \vert =\Big \vert \det\big(A\big)\big \vert\Big \vert \det\big(B\big)\big \vert = \det\Big(\Sigma_{A}\Big)\det\Big(\Sigma_{B}\Big)$

OP's Problem has
$A_k:= U_k\Lambda U_k^*$ so each $A_k$ has the same singular values and we need to prove
$\text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big) \leq \text{trace}\Big(\Sigma_{A_1}^m\Big)$
to do this we go through the following main claim about log-majorization

main claim:
$\log\big(\Sigma_{A_1A_2...A_m}\big)\preceq \log\big(\Sigma_{A_1}\big)+\log\big(\Sigma_{A_2}\big)+...+\log\big(\Sigma_{A_m}\big)= m \cdot \log\big(\Sigma_{A_1}\big)= \log\big(\Sigma_{A_1}^m\big)$
we defer the main claim proof momentarily and note what this implies:
For $u\in \mathbb R$ the function $u\mapsto e^u$ is (strictly) convex so for real diagonal matrix $D$ the function $g$ given by composing the trace with the exponential function i.e. $g\Big(D\Big) =\text{trace}\Big(\exp\big(D\big)\Big) = \sum_{k=1}^n e^{d_{k,k}}$ is a Schur Convex function hence

$\log\big(\Sigma_{A_1A_2...A_m}\big)\preceq \log\big(\Sigma_{A_1}^m\big)$
$\implies \text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big)= g\Big(\log\big(\Sigma_{A_1A_2...A_m}\big)\Big)\leq g\Big(\log\big(\Sigma_{A_1}^m\big)\Big) =\text{trace}\Big(\Sigma_{A_1}^m\Big)$
and completes the proof of the core result.

proof main claim:
Proceeding by induction.
Base Case:
$\log\big(\Sigma_{A_1A_2}\big)\preceq \log\big(\Sigma_{A_1}\big)+\log\big(\Sigma_{A_{2}}\big)$
by the result from Horn at top.

Inductive Case:
for natural numbers $m\geq 3$
we know
$ \log\big(\Sigma_{A_1A_2...A_{m-1}}\big)\preceq \sum_{k=1}^{m-1}\log\big(\Sigma_{A_k}\big)$
by inductive hypothesis
and need to prove
$ \log\big(\Sigma_{A_1A_2...A_{m-1}A_m}\big)\preceq \sum_{k=1}^{m}\log\big(\Sigma_{A_k}\big)$

combining the Horn result (with bunching) and Inductive Hypothesis we have
(i) $ \log\big(\Sigma_{A_1A_2...A_{m-1}A_m}\big)= \log\big(\Sigma_{(A_1A_2...A_{m-1})A_m}\big) \preceq \log\big(\Sigma_{A_1A_2...A_{m-1}}\big)+\log\big(\Sigma_{A_{m}}\big)$ and
(ii) $ \log\big(\Sigma_{A_1A_2...A_{m-1}}\big)\preceq \sum_{k=1}^{m-1}\log\big(\Sigma_{A_k}\big)$

due to notational issues, I'm going to re-write this in vector notation as
(i) $\mathbf x \preceq \mathbf y + \mathbf z_m$ and
(ii) $\mathbf y \preceq \sum_{k=1}^{m-1}\mathbf z_k$
$\implies \mathbf x\preceq \big(\sum_{k=1}^{m-1}\mathbf z_k\big)+\mathbf z_m=\big(\sum_{k=1}^{m}\mathbf z_k\big)$
(keeping with the convention for singular value ordering, the vector components are ordered from largest to smallest)

The implication is almost immediate. Writing out the definition for majorization we have for $r\in \big\{1,2,..,m\big\}$
(i) $\sum_{i=1}^r x_{i} \leq \big(\sum_{i=1}^r y_i \big) +\big(\sum_{i=1}^r z^{(m)}_i\big)$
(ii) $\big(\sum_{i=1}^r y_i \big) \leq \sum_{k=1}^{m-1}\big(\sum_{i=1}^r z^{(k)}_i\big)$
$\implies \sum_{i=1}^r x_{i} \leq \sum_{k=1}^{m-1}\big(\sum_{i=1}^r z^{(k)}_i\big) +\big(\sum_{i=1}^r z^{(m)}_i\big)$
and we recover
$\mathbf x\preceq \big(\sum_{k=1}^{m}\mathbf z_k\big)$ i.e. we've proven
$ \log\big(\Sigma_{A_1A_2...A_{m-1}A_m}\big)\preceq \sum_{k=1}^{m}\log\big(\Sigma_{A_k}\big)$

(Technically the above only shows weak majorization but the fact that it is strong is immediate -- either the earlier determinant argument will do it, or note that (i) says $\text{sum}\big(\mathbf x\big)= \text{sum}\big(\mathbf y + \mathbf z_m\big) = \text{sum}\big(\mathbf y\big) + \text{sum}\big(\mathbf z_m\big)$ and (ii) says $\text{sum}\big(\mathbf y\big) = \text{sum}\big(\sum_{k=1}^{m-1}\mathbf z_k\big) = \sum_{k=1}^{m-1}\text{sum}\big(\mathbf z_k\big)$, and combining these gives the equality case.)

other questions in the OP:

is it maximal when all $\{U_i\}$ are equal?

Yes. The above proves this. Note: this is a sufficient case, not a necessary condition. E.g. $\Lambda =\delta I$ always meets the equality case. However for $\Lambda \not \propto I$ this is unique in the sense that we may always choose a collection of unitary matrices where the inequality is strict. To prove this one need only consider the group of permutation matrices $S_n\subset U_n$, choosing $P_k:= I$ for $k\in\big\{1,2,...,m-1\big\}$ and letting $P_m$ be the permutation matrix (that is a type 2 elementary matrix) which swaps the $1$st and $m$th elements of $\Lambda$ which must be distinct. Application of the re-arrangement inequality tells us the upper bound is strict.

If so, does anything change if $\Lambda$ is non-negative rather than positive

No. The above argument in log-space would appear to require all positive singular values, but we can use continuity to work around this. I.e. suppose $\Lambda$ is singular and diagonal, PSD. Fixing some $m$, consider for $j \in \mathbb N$

$A_k^{(j)} := A_k + \frac{1}{2^j}I$ which is PD for all $j$.

Running the above argument on $A_k^{(j)}$ for arbitrary $j$ tells us

$\text{trace}\Big(\Sigma_{A_1^{(j)}A_2^{(j)}...A_m^{(j)}}\Big) \leq \text{trace}\Big(\Sigma_{A_1^{(j)}}^m\Big)$
and using the continuity of the trace we take a limit of this sequence, i.e.
$\text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big)=\lim_{j\to \infty}\text{trace}\Big(\Sigma_{A_1^{(j)}A_2^{(j)}...A_m^{(j)}}\Big) \leq \lim_{j\to \infty}\text{trace}\Big(\Sigma_{A_1^{(j)}}^m\Big)=\text{trace}\Big(\Sigma_{A_1}^m\Big)$
(and the strictness argument runs almost verbatim albeit in a block matrix form, attacking the non-singular block.)

user8675309
  • 12,193
  • Thanks for this - very helpful. I'll need to read up about majorization I think.

    As a possible alternative, I found this paper this morning https://link.springer.com/article/10.1007%2Fs11117-006-2054-6 which seems to be applicable setting $A_j = U_j \Lambda U_j^\dagger$ and $p_j = m$ for all $j = 1, \ldots, m$.

    – user Aug 05 '21 at 13:32
  • it's probably worth pointing out the above is a stronger result: $\log\big(\Sigma_{A_1...A_m}\big)\preceq \log\big(\Sigma_{A_1}...\Sigma_{A_m}\big)\implies \text{trace}\big(\Sigma_{A_1...A_m}\big) \leq \text{trace}\big(\Sigma_{A_1}...\Sigma_{A_m}\big) \leq \big \Vert \Sigma_{A_1}\big \Vert_{S_{p_1}}...\big \Vert \Sigma_{A_m}\big \Vert_{S_{p_m}}$ with Hoelder conjugates $p_k$ . The proof of RHS bound is identical to that for the scalar case. This is typical: majorization frequently allows you to reduce tough matrix singular value inequalities to diagonal matrices then use known scalar results. – user8675309 Aug 05 '21 at 17:51