Is there a natural way to "project" an arbitrary matrix to an orthogonal matrix?

Question

I am dealing with an optimization problem where I need to find an optimal rotation matrix. Let me first formulate the problem.

Input:

An initial rotation matrix $M\in SO(3)\subset\mathbb{R}^{3\times 3}$

A 3D point set $P$

A corresponding ground truth point set $G$

The orthogonal matrix is used to rotate the points by matrix multiplication, along with other non-linear transformations. Suppose the transformations as a whole are denoted by $f_M(\cdot)$. Then the transformed point set is $P'=f_M(P)$. The problem is to find an optimal rotation matrix $M$ such that the transformed points $P'$ are closest to a ground truth set of points $G$, i.e.

$$ \min_{M}\sum_{i}\|G_i-f_M(P_i)\|_2^2 $$

where $P_i$ is the $i$-th point and $G_i$ is its corresponding ground truth point.

Since the problem is nonlinear, I try to solve it with gradient descent. In each iteration, the following steps are adopted:

Compute loss $J=\sum_{i}\|G_i-f_M(P_i)\|_2^2$.
Compute gradient $\partial J/\partial M$ and set $\Delta M=-\alpha(\partial J/\partial M)$ where $\alpha$ is the learning rate.
Update $M\leftarrow M+\Delta M$.

The problem is, I need $M$ to be a rotation matrix, i.e. $M\in SO(3)$. But the updating formula above is very likely to violate this constraint. I can think of two ways of solving this:

After each update, project $M$ to $SO(3)$.
Before each iteration, parametrize $M$ with parameters that naturally represent $SO(3)$, and use gradient descent on these parameters instead.

But I am not sure how to computationally obtain the projection to $SO(3)$ or the parametrization.

Look here to see how SVD can be applied to this problem: https://math.stackexchange.com/questions/2215359/showing-that-matrix-q-uvt-is-the-nearest-orthogonal-matrix-to-a — , Apr 25 '21 at 03:17
https://en.m.wikipedia.org/wiki/Orthogonal_Procrustes_problem — user619894, Apr 25 '21 at 05:30

score 3 · Answer 1 · answered Apr 25 '21 at 04:15

3

The simplest idea is the Cayley transform, which transforms orthogonal matrices to skew symmetric matrices. Skew matrices are a nice vector space, so you can use your gradient descent without problems. Note that this works in any dimension.

answered Apr 25 '21 at 04:15

Igor Rivin

26,372

score 1 · Accepted Answer · answered Apr 25 '21 at 02:56

One possible way to do this would be to use quaternions.

If $q$ is any nonzero quaternion, then $q$ acts on the space $\Im=\{ai+bj+ck|a,b,c\in \mathbb{R}\}$ of pure-imaginary quaternions by conjugation: we send any $x \in \Im$ to $qxq^{-1}$. This map turns out to be a rotation of the three-dimensional space $\Im$, so any nonzero quaternion yields some element of $SO(3)$ (with two distinct quaternions yielding the same element if and only if they are nonzero real multiples of each other).

Since the space $\mathbb{H}$ of quaternions is just a vector space, you can do your gradient descent in that space. The only issue is that the map from $\mathbb{H}$ to $SO(3)$ is discontinuous at $0$, so you want to arrange your gradient descent never to get too close to $0$. Since multiplying a quaternion by a nonzero real number doesn't change its corresponding rotation, you can probably do this by just renormalizing your quaternion after every step (or maybe every few steps) and choosing a small enough step size that no one step will ever take you very close to $0$ when starting from a unit quaternion.

score 0 · Answer 3 · answered May 19 '21 at 19:05

This problem is well-known and have an exact closed-form solution, both for SO(3) case and more generic E(3) case. There's no need to do gradient descend. You should simply

Write your set of points P and G as Nx3 matrices (N rows of x,y,z coordinates)
Compute covariance matrix $C=P^T G$
Compute SVD decomposition $C = U \Sigma V^T$
Then your answer is $M=V \left[ {\begin{array}{ccc} 1&0&0 \\ 0&1&0 \\ 0 & 0 & \det(VU^T) \end{array} } \right]U^T$

See "Least-Squares Rigid Motion Using SVD" for a proof

Several variations for this idea are available. If you want to assign weights $w_i$ to individual points then compute NxN diagonal weight matrix $W$ with $w_i$ as diagonal elements and use $C = P^T W G$ for step 2. If you are OK with just orthogonal matrices (it's OK to have reflections as well as rotations) step 4) can be replaced with just $M=VU^T$. If you want to mix some translations and work in E(3), compute centroids for P and G sets first, use it to calculate zero-centered versions of both point sets $P'$ and $G'$ and compute $C$ using $P'$ and $G'$ instead of $P$ and $G$. Compute rotation according to formulas above, then add translation that moves $P$ centroid to $G$ centroid on top of it.

If you still want to use gradient descend instead of formulas, then probably the best way to do so is to work with Lie algebra $\mathfrak{so}(3)$. The idea is that gradient descent essentially gives you a direction $v$ where you should go. In terms of differential geometry this vector $v$ can be thought as a tangential vector to Lie manifold $SO(3)$. Elements of this tangential space form Lie algebra $\mathfrak{so}(3)$ and they can be mapped to original Lie space by exponentiation transform. When you start with initial approximation $M_i$, you can do update as following:

$M_{i+1} = Exp(\alpha \cdot v) M_i$

The $M_{i+1}$ will be an element from $SO(3)$ that can be thought as $\alpha$ distance away from $M_i$ in direction $v$. There's more than one way to do this trick, but one of the simplest is to use matrix representation. This way $\mathfrak{so}(3)$ is a space of skew-symmetric 3x3 matrices and $Exp$ is a matrix exponent. You can compute latter using Rodrigues' rotation formula. It can be also approximated by well-known Taylor series. In particular, 1st order Taylor expansion $Exp(u) ~= I + u$ is widely used to approximate small rotation matrices. This is an overkill for this problem, but it works like a charm for more complex optimization problems involving N bodies.

Is there a natural way to "project" an arbitrary matrix to an orthogonal matrix?

3 Answers3