I have been working on Riemann-Newton optimization on a Grassmannian manifold with the orthogonal projector representation. This manifold is denoted by
$$\operatorname{Gr} (n,k) = \left\{ P \in \mathbb{R}^{n \times n} : P=P^\top, P^2=P, \operatorname{rank}(P) = k \right\}$$
whereas the tangent space is denoted by
$$\operatorname{T}_P \operatorname{Gr}(n,k) = \left\{ X \in \mathbb{R}^{n \times n} : X = X^\top, X=PX+XP \right\}$$
The Riemannian Hessian is denoted by
$$\operatorname{Rhess}f(P)[X] = \operatorname{ad}_P \left(\text{ad}_P \operatorname{Chess}f(P)[X] -\text{ad}_{\text{Cgrad}f(P)} X \right)$$
where $\text{ad}_AB = AB-BA$ and Rhess, Chess and Cgrad are hessian operator and gradient in Riemannian or Cartesian manifold.
Without a preconditioner, the truncated conjugate gradient method for the Newton equation converges very slowly, so I plan to implement a preconditioner. A preconditioner is basically an (approximate) inverse (Riemannian) Hessian operator. So I am going to construct the preconditioner in the reverse way of constructing the Riemannian Hessian operator. The following algorithm illustrates how the Riemannian Hessian operator works, according to the "Riemannian hessian" entry above.
RiemannianHessianOperator(Parameters: P, CG, CH; Input: X; Output: RHX):
// P is the current point on the Grassmann manifold; Symmetric.
// CG is the Cartesian gradient at P; Symmetric.
// CH is the Cartesian hessian operator at P; Self-adjoint in Cartesian manifold.
// X is a point in the tangent space; Symmetric.
// RHX is the Riemannian hessian along X; Symmetric; A point on the tangent space.
CHX = CH(X) // Calculating the Cartesian hessian (along X), CHX is symmetric.
A = P * CHX - CHX * P
B = CG * X - X * CG
S = A - B // A, B and S are intermediate matrices that are all skew-symmetric.
RHX = P * S - S * P // RHX is symmetric.
return RHX
And here is the reverse way for the preconditioner:
Preconditioner(Parameters: P, CG, invCH; Input: RHX; Output: X):
// invCH is the (approximate) inverse cartesian hessian operator.
Solve RHX = P * S - S * P for S (1) // RHX is a point on the tangent space; Symmetric.
B = CG * X - X * CG // A, B and S are skew-symmetric.
A = B + S
Solve A = P * CHX - CHX * P for CHX (2)
X = invCH(CHX) // X is a point on the tangent space; Symmetric.
return X
As you can see, in the Preconditioner, you need to solve two Sylvester-like equation $PY-YP=W$ for $Y$ where P is a Grassmannian point and $Y$ and $W$ are (skew-)symmetric matrices. I asked about the two equations before. The solutions exist when $PWP=(I-P)W(I-P)=0$, $W$ is a point on the tangent space of $P$. The solutions are $Y=PW-WP+PUP+(I−P)V(I−P)$ where $U$ and $V$ are arbitary (skew-)symmetric matrices depending on whether Y is attempted to be skew-symmetric (equation (1)) or symmetric (equation (2)).
Here come two questions that prevent me from moving forward:
- I cannot make sure that $PWP=(I-P)W(I-P)=0$ holds for equation (2).
For equation (1), since RHX is a point on the tangent space, it fulfulls the existence condition $PWP=(I-P)W(I-P)=0$ when $W=$RHX.
But the same cannot be said about equation (2), where $W$ is a skew-symmetric matrix when $W$=A.
- What about the multiple solutions coming with different $U$ and $V$?
I am thinking of setting $U$ and $V$ to zero so that the solution becomes $Y=PW-WP$. Is it a good way to construct the preconditioner?