3

I was trying to read Edelman et al.'s 1998 paper "The Geometry of Algorithms with Orthogonality Constraints" and since I don't have any differential geometry or much linear algebra background I am stuck at a few places.

This is regarding section 2.2.1, i.e tangent and normal spaces of the Stiefel manifold. Here is the excerpt:

Let $Z$ be any $n$-by-$p$ matrix. Let $\newcommand{\sym}{\operatorname{sym}}\sym(A)$ denote $(A + A^T)/2$ and $\newcommand{\skew}{\operatorname{skew}}\skew(A)$ denote $(A - A^T)/2$, then at $Y$, $\pi_N(Z) = Y \sym(Y^TZ)$ defines a projection of Z onto the normal space. Similarly at $Y$, $\pi_T(Z) = Y \skew(Y^TZ) + (I - YY^T)Z$

I tried deriving $\pi_N(Z)$ as $\newcommand{\tr}{\operatorname{tr}}\pi_N(Z) = \tr(N^TZ)$; $N = YS$ where $S$ is any $p$-by-$p$ symmetric matrix but couldn't arrive at the result. But using that result I could easily derive $\pi_T(Z)$ because $Z = \pi_T(Z) + \pi_N(Z)$

Next, they go on to say that tangent directions $\Delta$ at $Y$ then have the general form: $\Delta = YA + Y_{\perp}B$ where $A$ is $p$-by-$p$ skew symmetric, $B$ is any $(n-p)$-by-$p$ matrix and $Y_{\perp}$ is any $n$-by-$(n-p)$ matrix such that $YY^T + Y_{\perp}{Y_{\perp}}^T = 1$

I couldn't derive the general form of the tangents either. Can someone please help me out on this?

Thanks

Ben Steffan
  • 8,325
tselvan
  • 173
  • I wrote a general answer which justifies the $\Delta$ matrices. Please, tell me if you need further steps/clarifications ok? – Avitus Aug 27 '13 at 10:22

3 Answers3

2

I will try with an informal approach.

  • Tangent space and curves on manifolds

Let $p$ be any point of a given manifold $S$ of dimension $n$. With $\gamma:[a,b]\rightarrow S$ we denote any path on $S$ passing through $p$, with $\gamma(0)=p$. To understand the tangent space $T_pS$ of $S$ at $p$, one needs to consider the tangent vectors

$$\frac{d(\phi\circ\gamma)}{dt}(0) $$

at $t=0$, denoting by $\phi:U\ni p\rightarrow \mathbb R^n$ any local chart around $p$. The derivative w.r.t. $t$ of $\phi\circ\gamma$ is computed using the standard techniques of Analysis. The tangent space at $p$ consists of the equivalence classes of such paths through $p$; two paths are equivalent if their first derivatives at $0$ coincide. One puts a linear space structure on $T_pS$ and checks that all constructions are chart-independent.

  • Tangent space on Stiefel manifolds

If we consider a Stiefel manifold $V_{n,p}$, some simplifications occurr. In fact, any point $Y$ on the Stiefel manifold $V_{n,p}$ is an $n\times p$ matrix which satisfies $$Y^T Y=I. $$

This implies that any path $\gamma:[a,b]\rightarrow V_{n,p}$ is s.t. $t\mapsto \gamma(t)=Y(t)$, i.e. the image of $\gamma$ is a matrix on the Stiefel manifold with elements which are functions of $t$, for all $t\in[a,b]$. In other words, we can use the standard calculus techniques to compute

$$\frac{d}{dt}\gamma $$

elements-wise in $\gamma(t)\in V_{n,p}$. In order to find the relations defining the tangent space at $Y\in V_{n,p}$ we follow the above scheme. We begin by considering a path $\gamma$ on the Stiefel manifold s.t. $\gamma(t)=Y(t)$ and $\gamma(0)=Y(0):=Y$; the relations $Y(t)^T Y(t)=I$ are equivalent to

$$\gamma^T(t)\gamma(t)=1(t), ~~(*)$$

denoting by $1(t)$ the image of the constant path on $V_{n,p}$, i.e. $1(t)=I$, for all $t$. We differentiate $(*)$, obtaining

$$\frac{d}{dt}\left(\gamma^T\gamma\right)=0,$$

and

$$\left(\frac{d\gamma}{dt}(0)\right)^T\gamma(0)+ \gamma(0)^T\frac{d\gamma}{dt}(0)=0$$

at $t=0$. Denoting by $\Delta:=\left(\frac{d\gamma}{dt}(0)\right)^T$ and recalling that, by definition, $\gamma(0)=Y$, we arrive at

$$\Delta^TY+ Y^T\Delta=0. $$

In the reference at pag. 307 you can find the above formula. $\Delta$, by definition, is any element of the tangent space at $Y$. I hope it helps.

  • On the projections onto the tangent, resp. normal space at $Y$

Let us consider the setting above. The normal space at $Y$ is the space

$$\mathcal N_Y=\{N\in V_{n,p}: tr(\Delta^TN)=0\}, $$

as the trace map $tr$ defines an inner product, as shown at pag. 308. The important step to show is the following: for any $Z\in V_{n,p}$, then

$$A=\pi_N(Z):=Y\operatorname{sym}(Y^TZ)\in \mathcal N_Y.$$

In fact

$$tr(\Delta^TA)=\frac{1}{2}tr(\Delta^T Y(Y^TZ+(Y^TZ)^T))=\frac{1}{2}\left(tr(\Delta^T Z)-tr(\Delta Z^T)\right)=0,$$

for the properties of trace. Similarly, one proves that $\pi_T(Z)\in T_Y V_{n,p}$, i.e.

$$\pi_T(Z)^T Y+Y^T\pi_T(Z)=0. $$

The final step towards the decomposition is to show that the equality

$$Z=\pi_T(Z)+\pi_N(Z), $$

holds. In other words, each $Z$ is determined by the orthogonal splitting in tangential and normal components.

Avitus
  • 14,348
  • 1
  • 31
  • 52
  • Avitus : thanks for your clear exposition of the fundamentals but my doubt is still untouched. I want to know about the projection of some matrix Z on the normal space and tangent space at X. – tselvan Aug 27 '13 at 12:12
  • Ok @Tirumarai: let me work on it :-) I will edit the answer, as soon as I get some computations – Avitus Aug 27 '13 at 12:27
  • Thanks Avitus: Also please look into the followup question right here : http://math.stackexchange.com/questions/477945/horizontal-and-vertical-tangent-space-of-orthogonal-group – tselvan Aug 28 '13 at 06:42
  • @Avitus Your link for the references seems not to work. Could you quickly state the book or article you are referring to? - Thank you in advance. – Mathematics enthusiast May 11 '23 at 17:56
0

It is apparent that $\pi_{N}(Z) \neq tr(N^{T} Z)$ since $\pi_{N}(Z)$ should be an element in the ambient space $R^{n\times p}$, $tr(N^{T} Z)$ is only a real scalar, serving as the inner product between $N$ and $Z$.

0

As someone who didn't have much differential geometry background, I found the explanation in the paper "Optimization algorithms exploiting unitary constraints" quite useful. The main point is that a general matrix $Y \in \mathbf{C}^{n \times p}$ can be decomposed as shown in Eq.15

\begin{equation} Y = XA + X_\perp B + X C \, , \end{equation}

in terms of a skew-hermitian matrix $A \in \mathbf{C}^{p \times p}$, a hermitian matrix $C \in \mathbf{C}^{p \times p}$, and an arbitrary matrix $B \in \mathbf{C}^{(n-p) \times p}$, with $X_\perp \in \mathbf{C}^{n\times(n-p)}$ the column-orthogonal complemented as defined in your question.

Using this decomposition, we can look at the projection $\pi(X + tY)$ from the perturbed point $X + tY$ onto the Stiefel manifold. We call $Y$ a direction. Lemma 8 shows

\begin{equation} \pi(X + t Y) = X + t(X A + X_\perp B) + \mathcal{O}(t^2) \, , \end{equation}

from where we can deduce that the normal space, i.e. the space of directions $N$ such that $$\pi(X + t Y) = X \, ,$$ can be parametrised as \begin{equation} N_X = \{ N \in \mathbf{C}^{n \times p}: N = XC, C \in \mathbf{C}^{p \times p}, C=C^H \} \, . \end{equation}

Since the tangent space is defined as the space of directions orthogonal to the elements in the normal space, using Lemma 8, the definition of $N_X$ and the Euclidean inner product $\langle A, B \rangle = tr (A B^H)$, it is straightforward to prove that the tangent space is parametrised as

\begin{equation} T_X = \{ Z \in \mathbf{C}^{n \times p}: Z = XA + X_\perp B, A \in \mathbf{C}^{p \times p}, A + A^T = 0, B \in \mathbf{C}^{(n-p)\times p} \} \, . \end{equation}

Then, we can verify that

\begin{equation} \pi_{T, X}(Y) = (I - X X^T) Y + X \mathrm{skew}(X^T Y) \end{equation}

is the projection of an arbitrary matrix $Y \in \mathbf{C}^{n \times p}$ onto the tangent space. Using the orthonormal condition between $X$ and $X_\perp$, we obtain the first term as

\begin{equation} (I - X X^T)Y = X_\perp X_\perp^T Y = X_\perp B \, . \end{equation}

And the second term using

\begin{equation} X^T Y = A + C \, , \end{equation} followed by the $\mathrm{skew}$ operator to pick the skew-symmetric factor $A$ \begin{equation} \mathrm{skew}(X^T Y) = A \, , \end{equation}

which we multiply by $X$ to get $X A$. Similarly, we derive the projection onto the normal space \begin{equation} \pi_{N, X}(Y) = X \, \mathrm{symm}(X^T Y)\, , \end{equation} that picks the symmetric factor \begin{equation} \mathrm{symm}(X^T Y) = C \, , \end{equation} and multiplies it by $X$ to get $XC$.