Understanding a proof about Riemannian metrics in three dimensions always being diagonalizable

Question

I've recently been working through Deturck's and Yang's Existence of elastic deformations with prescribed principal strains. First and formost, I'm interested in it's proof that Riemannian metrics can in three dimensions always be diagonalized, that is, we can always find an atlas of coordinate functions so that the metric components become $g_{i j} = 0$ for $i \neq j$ when represented with regards to this metric, but I have some trouble fully understanding it.

Now the general idea of the proof is to choose some orthonormal frames $\{\overline{e_1}, \overline{e_2}, \overline{e_3}\}$ of vector fields on $M$, a corresponding dual basis $\{\overline{\omega}^1, \overline{\omega}^2, \overline{\omega}^3\}$ of one-forms (which he calls the reference frame), and solving them for a set of coordinate functions $\{x^1, x^2, x^3\}$ in which the metric becomes diagonal, with the dual coframe $\{\omega^1, \omega^2, \omega^3\}$.

Now a large part of the proof comes down to some tedious but simple calculations involving the Frobenius theorem, the structure equations and some other fundamental constructs, but it's the final step of the proof that I cannot wrap my head around so far and that I'm hoping someone can explain to me. I hope it suffices to only reproduce the final part of the proof aswell as an outline of the preliminary steps of the proof here; if anyone shall require more of the proof, feel free to let me know.

Now let, as mentioned above, $\omega^i, i = 1, 2, 3$ be the orthonormal coframe to the desired coordinate functions $(x^1, x^2, x^3)$ to which our metric becomes diagonal of which we want to show the existence. The first major step in the proof is that they show it's an equivalent condition for the existence of such a coframe $\omega^i$ with the property that the corresponding frame diagonalizes the metric is that

$$\omega^1 \wedge \omega^2 \wedge \omega_2^1 = 0, \omega^1 \wedge \omega^3 \wedge \omega_3^1 = 0, \omega^2 \wedge \omega^3 \wedge \omega_3^2 = 0 \tag{1} $$

where $\omega_i^j$ is the connection form.

Given the existence of such a coframe, one would be able to represent it with respect to the reference frame $\omega^j$ via $\omega^i = \sum_{j=1}^n b_j^i \overline{\omega}^j$ for some coefficients $b_i^j$ where $(b_i^j) = b \in C^\infty(M, SO(3))$. In other words, it's sufficient to find such a matrix-valued function $b$ with this property.

Using some more calculations, they then rewrite the equation $(1)$ without dependency on the desired coframe $\omega^i$ but instead with the matrix components $b_i^j$; namely, they show that the existence of $\omega^i$ that satisfy equation $(1)$ is equivalent to the existence of a $b \in C^\infty(M, SO(3))$ that satisfy:

$$0 = \sum_{p, q, j, k} b_p^i b_q^l \overline{\omega}^p \wedge \overline{\omega}^q \wedge \left( \frac 12 \left( b_k^l \overline{e}_k (b_j^i) - b_k^i \overline{e}_k (b_j^l) \right) \overline{\omega}^j + b_k^l b_j^i \overline{\omega}_k^j \right) \tag{2}$$

I'm sorry for this lengthy lead-up; now my actual question starts here as this is where I can't follow anymore.

I'll just quote the rest of the proof step-by-step:

Proposition 4.8. The linearization of [$(2)$] is diagonal hyperbolic.

This is one of my main issues so far: Deturck and Yang go on to proof this statement, but they don't lose a word about how the existence from such functions $b_i^j$ can be seen from this fact. Why does the fact that the linearization of $(2)$ is diagonal hyperbolic guarantee the existence of a solution? What exactly are our $b_i^j$'s, why do they exist because of it?

Proof. It suffices to linearize [(2)] around the frame where $b_j^i(x) \equiv \delta_j^i$, since we can choose the reference frame $\{\overline{\omega}^i\}$ to be equal to the frame $\{\omega^i\}$ around which we are linearizing.

Now... what exactly is happening here? Why can we choose the desired frame $\overline{\omega}^i$ to be (locally, I assume?) equal to the reference frame? One of them is (supposed to be) diagonal whereas the other one is not, so why can they be equal here?

Let $\beta_j^i = (d b)_j^i$ be the variation in $b - \beta_j^i$ is a skew-symmetric matrix-valued function.

My next problem here is: which variance exactly are they talking about here? How is this variance defined? I'm sorry but I'm not familiar with any definition of variance in this context and couldn't find a definition that makes sense so far. Do they mean something like the (exterior) derivative of $b$ since they write $d b$? Or is it something else entirely? I otherwise only know a variance from a stochastic context and I highly doubt that's what they mean here.

The linearization of [(2)] is thus:

$$\frac 12 \left( \overline{e}_i \left(\beta_j^i\right) - \overline{e}_i \left(\beta_j^l\right) \right) \overline{\omega}^i \wedge \overline{\omega}^l \wedge \overline{\omega}^j + \text{ lower order terms in } \beta = 0$$

for $(j, j, l ) = (1, 2, 3), (2, 3, 1),$ and $(3, 1, 2)$

I guess I could understand why this is the linearization if I knew what exactly the variance here is.

We write out the three equations for the linearization:

$$\frac 12 \left( \overline{e}_1 \left( \beta_3^2 \right) - \overline{e}_2 \left(\beta_3^1 \right) \right) = \text{lower order terms in } \beta = 0 $$

$$\frac 12 \left( \overline{e}_2 \left( \beta_1^3 \right) - \overline{e}_3 \left(\beta_1^2 \right) \right) = \text{lower order terms in } \beta = 0 $$ $$\frac 12 \left( \overline{e}_3 \left( \beta_2^1 \right) - \overline{e}_1 \left(\beta_2^3 \right) \right) = \text{lower order terms in } \beta = 0 $$

By alternately adding two of the equations together and subtracting the other, we obtain a system of the form:

$$\overline{e}_1(\beta_3^2) = \text{lower order terms in } \beta = 0 $$

$$\overline{e}_2(\beta_1^3) = \text{lower order terms in } \beta = 0$$

$$\overline{e}_3(\beta_2^1) = \text{lower order terms in } \beta = 0$$

This is obviously diagonal form. q.e.d.

This final part I can follow again: given the linearization he derived, I think I can see that the linearization here is diagonal. But, as mentioned above, how does the fact that this linearization is diagonal prove that a function $b \in C^\infty(M, SO(3))$ exists so that $(3)$ is satisfied? What's the connection here that I'm missing?

I realize that this is a lengthy question, and I hope I phrased it in a manner that makes it clear what I can and what I cannot understand, and what answer(s) I seek here. To resume, my main issue is understanding how the linearization of system $(2)$ being diagonal is sufficient for the existence of a matrix function $b$ that satisfies $(2)$, why the desired coframe and the reference coframe can be chosen equal locally, and what exactly this variance "$\beta_j^i = (d b)_j^i$" is and how it plays into the linearization of the system $(2)$. Any help would be greatly appreciated. If anyone desires more details about the paper or the preliminary steps of the proof, I'm happy to provide them.

For "variation", they're talking about Calculus of Variations. For the rest -- those all seem like good questions. I wish I could answer more of them. — John Hughes, Sep 09 '17 at 14:32
@JohnHughes Thank you for this tip, I'll look into it, although it seems like a big topic. Would you happen to have an idea what the variance for, in this case, a matrix-valued function $b: M \to SO(3)$ could be concretely defined as; what I could use as a definition here to work with? — moran, Sep 09 '17 at 14:43
It's not "variance" but "variation". (I know...those should be the same, but they're terms of art). A "variation" is a small alteration of a curve, typically w some boundary conditions. For instance, you might take a curve $c:[0,1] \to R^2$, going from $A$ to $B$ and adjust it to $c + h$, where $h$ is a "variation" -- a function $h:[0,1] \to R^2$ such that $h(0) = h(1) =0$> Now all curves $q_a = c + ah$ (for any number $a$ near $0$) are paths from $A$ to $B$, and as you adjust $a$, they look more (or less) like $c$. And if $c$ is the SHORTEST path, then you know the $q_a$s are no shorter. — John Hughes, Sep 09 '17 at 18:52
You can then compute $L(a) = length(q_a)$, and say that $dL/da(0)$ must be $0$, since $L(0) = length(q_0) = length(c)$. Fiddling with the resulting equations (typically using integration by parts) leads you to the conclusion that $c$ must be a straight line .. the first big theorem in calc of variations. For a map to $SO(3)$...all of this is messier, alas. — John Hughes, Sep 09 '17 at 18:54

score 3 · Accepted Answer · answered Sep 10 '17 at 02:20

When we linearize the equation $F(b)=0$, the equation we get is no longer for the variable $b$, but rather for its variation, which can be thought of as an infinitesimal change in $b$. To put it precisely, the linearization about a known solution $b$ is the equation $$DF|_b (\beta) := \left.\frac{d}{dt}\right|_{t=0} F(b+t\beta) = 0$$ for the unknown $\beta$. In the finite-dimensional case, e.g. where we are trying to solve the equation $$F(x,y) = x^2+y^2-1=0,$$ you should be familiar with $DF$ as the directional derivative of $F$, which acts on tangent vectors (variations!) based at $b$. The linearized equation at $(x,y) = (1,0)$ in this toy example is $$DF|_{(1,0)}(u,v) = (2x dx + 2y dy)(u,v) = 2u = 0;$$ so any vertical tangent vector based at $(1,0)$ is a linearized solution, corresponding to the fact that the solution space of $F=0$ has a vertical tangent line at this point.

Since the $F$ in the paper (the RHS of $(2)$ considered as a function of $b^i_j$) acts on $SO(3)$-valued functions, the corresponding variations $\beta^i_j$ of $b^i_j$ will be $\mathfrak{so}(3)$-valued (i.e. skew-symmetric matrix-valued) functions.

This is one of my main issues so far: Deturck and Yang go on to proof this statement, but they don't lose a word about how the existence from such functions $b^i_j$ can be seen from this fact. Why does the fact that the linearization of (2) is diagonal hyperbolic guarantee the existence of a solution? What exactly are our $b^i_j$, why do they exist because of it?

Actually, immediately after the proof of linearized hyperbolicity they describe how this implies the existence of a solution $b.$ Theorem 1.4 tells us that for a nonlinear symmetric hyperbolic system (i.e. a PDE system whose linearization is symmetric hyperbolic), any noncharacteristic initial data can be continued to a local solution. Thus once the authors have proved the system is hyperbolic, the proof can be completed by showing that noncharacteristic initial data can be found, which is what is described in at the end of the proof. There isn't anything like an explicit formula for $\beta$: for the purposes of this paper the PDE existence theory is treated as a black box. The authors verify the assumptions (hyperbolicity and noncharacteristic data) and thus conclude the existence of a solution.

Why can we choose the desired frame $\bar\omega$ to be (locally, I assume?) equal to the reference frame?

There's a little bit of handwaving going on here, but I think the point is that while the equation we are solving is phrased in terms of a reference frame $\bar \omega$, it's really describing a geometric object that is independent of the reference frame chosen. This can probabably be rigorously phrased as the $SO(3)$-invariance of $(2),$ though I'm not going to try to work this out now. The consequence of this would be that we can compute the linearization about any $b$ by rotating our frame to make $b$ the identity everywhere and then computing the linearization, as the authors do.

Thank you for the comprehensive explanation! I had glanced over theorem 1.4 before but didn't realize its connection to this theorem since it looked so differently on paper. So to clarify, if we consider the RHS of equation $(2)$ as our function $F(b)$, then for the linearization, we instead just consider the equation $DF|b (\beta) := \left.\frac{d}{dt}\right|{t=0} F(b+t\beta) = 0$, with the rest of the proof just consisting of the authors simplifying that expression to show that it's diagonal? — moran, Sep 10 '17 at 16:55
So the linearization can be thought of as a sort of directional derivative here, and the variation is essentially a generalization of the tangent vectors of the function $F$? And also, one more question: this noncharacteristic initial data you mention, is this the $b_i^j(x) = \delta_i^j$ that the authors get by setting the two frames equal? Or what part specifically is our initial data here in the proof? — moran, Sep 10 '17 at 17:00
@moran: the linearization is a directional derivative and the variation is just the direction we are differentiating in. The initial data is just an orthonormal frame chosen along a small patch of some surface , with the noncharacteristic condition being that none of the vectors in the frame are ever tangent to the surface, as discussed in the last paragraph of the proof. — Anthony Carapetis, Sep 10 '17 at 23:55
Actually, another question if I may: from what I understand, theorem 1.4 and 1.5 require the manifold $M$ to be compact (see p. 245, first paragraph about symmetric-hyperbolic systems), whereas theorem 4.2, p. 255 (about the metrics being diagonalizable in $3$ dimensions) does not. So why can we still use theorem 1.4, 1.5 of the paper here? Is this an argument like "W.l.o.g., we can assume $M$ to be compact, otherwise decompose $M$ into compact subsets and find solutions from them individually" or something like that? — moran, Sep 13 '17 at 14:06

Understanding a proof about Riemannian metrics in three dimensions always being diagonalizable

1 Answers1