Intuition behind variational principle

Question

Hofer-Zehnder, in section 1.5, proves that every Hamiltonian field on a strictly convex compact regular energy surface carries a periodic orbit. I have understood the proof. What I am wondering about is how he gets the variational principles. The start of the proof finds a function $H$ having the given convex hypersurface as energy surface and satisfying positive homogeneity of degree two, positive definiteness of the Hessian, and $\mathcal{C}^2$ regularity. Then the proof normalizes the period to get period $2\pi$, and then it goes:

For example, the variational principle:

$$\min\int_0^{2\pi}H\big(x(t)\big)\,dt\quad{\rm under}\quad\frac{1}{2}\int_0^{2\pi}\langle Jx,\dot x\rangle=1.$$

[…] has the above equations as Euler equations.

Firstly, what are the Euler equations of a variational principle? And secondly, how did they get this particular principle?

Update

Let me make my question more precise. There are 3-4 points I have trouble with.

Why do they consider this principle with this strange constraint of that integral being 2? I mean, it isn't even a vector space! It's perhaps not even an affine space, it has no particular properties. Why should this constraint be useful? And why that functional?
How do I get the Euler equations for this principle, and what are they? If I try to say "directional derivatives 0", I get stuck at once, because how am I supposed to make the variation?
The book then goes on, and says this functional attains neither the infimum, nor the supremum. How can one prove these claims?
The book then Legendre transforms $H$ into $G$, gives properties of $G$, and introduces the same principle on $G$:

$$\min\int_0^{2\pi}G(\dot z)\,dt\quad{\rm under}\quad\frac{1}{2}\int_0^{2\pi}\langle Jz,\dot z\rangle=1.$$

Again, why should I consider this particular principle, with this particular constraint? I mean, I understood the rest of the proof so I know this is the right principle to consider, but how was it come up with? Is there any reason that would make one think of this principle as a possible correct choice?

Note that the tags 'Calculs of variations" alone is not at all a popular tag. That might explain why your question received so little attention. — , Sep 24 '15 at 09:27
I am not sure if I should include pde or even differential equations (which is a more popular tag). — , Sep 24 '15 at 09:28

score 4 · Accepted Answer · edited Apr 13 '17 at 12:21

Let's not consider a closed path for now but some smooth curve $\gamma:[0,1]\to M$ , and consider the functional $$S[\gamma]:=\int_0^1 \sum_ip_i(\gamma(t)) \mathrm d q_i(\gamma(t))-H(\gamma(t)) \mathrm d t$$ where I'm working in a chart with coordinates $(q_1,\dots,q_n,p_1,\dots,p_n)$. An old problem in classical mechanics (finding trajectories with fixed endpoints) was formulated such that this function is stationary wrt the true physical trajectory. You can read more about how calculus of variations works, so here I will get the first variation of this functional a bit heuristically. To get the variation, we compute $S[\gamma + \epsilon \eta]-S[\gamma]=\epsilon \delta S[\gamma]+O(\epsilon^2)$, where $\eta:[0,1]\to M$ is a variation with $\eta(0)=\eta(1)=0$. In practice, we can get the variation by operating with $\delta$ as if it were an ordinary differential, so for instance $\delta(a b)=a\delta b + b\delta a$, $\delta(f(x,y))=f_x \delta x + f_y \delta y$ etc. The Hamiltonian explicitly depends on $H(p_1(\gamma(t)),q_1(\gamma(t)),\dots,p_n(\gamma(t)),q_n(\gamma(t)))$, so in varying this we get

$$\delta S = \int_0^1 \sum_i\left(\delta p_i \mathrm d q_i + p_i\mathrm d \delta q_i\right)-\sum_i\left(\frac{\partial H}{\partial p_i}\delta p_i + \frac{\partial H}{\partial q_i}\delta q_i\right)\mathrm d t$$ and now we note that $p_i\mathrm d \delta q_i=p_i\delta q_i-\delta q_i\mathrm d p_i$. This means that we may integrate by parts: $$\delta S = \sum_i\int_0^1 \left(\mathrm d q_i -\frac{\partial H}{\partial p_i}\mathrm d t \right) \delta p_i -\left(\mathrm d p_i +\frac{\partial H}{\partial p_i} \mathrm d t \right) \delta q_i + \sum_i p_i\delta q_i|_{0}^1.$$ The last term is usually called a surface term, and it vanishes because the variation vanishes at the endpoints, since the endpoints are fixed.

Requiring $\delta S = 0$ gives $$\frac{\mathrm d q_i}{\mathrm d t}=\frac{\partial H}{\partial p_i},\,\frac{\mathrm d p_i}{\mathrm d t}=-\frac{\partial H}{\partial q_i}$$ along $\gamma$. This is the same as requiring that $\gamma$ be the flow of the Hamiltonian vector field $X_H$ given by $dH=i_{X_H}\omega$ (modulo a minus sign I always forget). Historically, this was the motivation for symplectic geometry, with the symplectic manifold as the cotangent bundle $T^*Q$ of the $n$-dimensional configuration space $Q$, which is nothing more than the manifold where (generalized) coordinates live. Alternatively we may write $$\dot x = J\nabla H$$

These are properly called Hamilton's equations, although they may be viewed as Euler-Lagrange equations, which are given by the same procedure but with a functional $\int L(\dot q,q;t)\mathrm dt$. The book is probably referring to Hamilton's equations as Euler equations, because that's what they are but for this particular functional.

Now, for your particular problem, you have periodic motion. I remember that you asked in this question about why over a closed path we get $$\int_{\gamma} \lambda:=\int_{\gamma} \sum_i p_i\mathrm d q_i = \frac12\int_0^{2\pi}\langle-J\dot x,x\rangle dt$$ The constraint is a normalization of this, basically. And since $\langle J x,Jy\rangle=\langle x,y\rangle$, this really is equivalent to your constraint. To obtain the correct equations of motion (as they are called) from variational calculus, when constraints are present, we use Lagrange multipliers. Consider the problem of making $\int F(\dot x,x,t) \mathrm dt $ stationary, where $F$ is some functional with the constraint $\int M(\dot x,x,t) \mathrm dt= m$.This can mean for instance that the total mass is m, whatever. The correct functional to which we may apply Euler-Lagrange equations is

$$\int\left(F(\dot x,x,t) + \lambda(x)M(\dot x,x,t)\right)\mathrm dt$$

So, by saying minimize $\int H(x(t))\mathrm dt$ along with the constraint, I'm really saying to minimize the so called "action functional", which is the one I started this answer with ($\lambda$ will drop out during calculation, and turn out to be a constant in this case). All of this should answer your questions 1. and 2.

Whether $S[\gamma]$ attains an infimum or supremum isn't important, only whether it's stationary is. I don't know how I would show what this attains a minimum or maximum. That would require the second variation, or something like the non-negativity of the Weierstrass excess (or simply E) function $E(x,y,z,w)=F(x,y,w)-F(x,y,z)-(w-z)F_{y'}(x,y,z)$.

To find constrained extrema of a function $F$ with Lagrange multipliers, one takes the constraints (e.g. a single $G=m$) and uses $F-\lambda(G-m)$. Doing this with functionals would give me $\int F(\dot x,x,t)dt-\lambda(\int M(\dot x,x,t)dt-m)$, and to bring the $m$ inside the integral one should be careful because here the integral is from 0 to $2\pi$ so $m=\int_0^{2\pi}\frac{m}{2\pi}dt$. So in your action functional at the end of the answer, shouldn't you have $\frac{m}{2\pi}$ for our case? — MickG, Sep 24 '15 at 10:09
For the record, «the above equations» mentioned in the quote are $\dot x=\lambda J\nabla H$, where $\lambda$ is a constant from normalizing the period to $2\pi$ and $J=(\begin{smallmatrix} 0 & 1 \ -1 & 0 \end{smallmatrix})$, where the numbers represent $n\times n$ blocks and we are working in $\mathbb{R}^{2n}$. — MickG, Sep 24 '15 at 10:13
I tried getting EL equations for this action and found $\frac{\lambda(x)}{2}\dot x=-J\nabla H$, so I think $\lambda(x)$ should be chosen to satisfy $-\frac{2}{\lambda(x)}=\lambda$, so a constant function. — MickG, Sep 24 '15 at 10:16
Any reason to ask for the symplectic action $\frac12\int\langle Jx,\dot x\rangle dt$ to be 1 and not 0? After all, if I understand you correctly, then the EL equations do not really depend on the value of the constant $m$, since applying $\delta$ makes it vanish, except for terms where you "$\delta$" $\lambda$ but those vanish by imposing the constraint because they are $\lambda_{p_i}(M-m)\delta p_i$ integrated in $dt$ or with $q$'s instead of $p$'s. — MickG, Sep 24 '15 at 10:18
Yes, the $m$ should have been outside of the integral, I'll edit it. It doesn't contribute to anything so I just put it in there automatically, lapsus calami. $\lambda$ should be a constant, I think, yes. And that's the way to reproduce the action functional in full. In general, any total derivative doesn't contribute to anything, because if we vary $\int_0^1 [F\mathrm d t + \mathrm d G] = \int_0^1 F\mathrm d t + G(1)-G(0)$, we don't vary the endpoints so these terms drop out. Normalization is conventional, but I'd be wary of normalizing something to 0. — krvolok, Sep 24 '15 at 10:45
Also, the EL equations for the functional with the Legendre transform are the same, right? — MickG, Sep 24 '15 at 12:20
I'd be careful since if the integral is any number at all, I can reparametrize the coordinates (choose different units of length, basically) and normalize this to 1. I don't see how to make sense of normalization to 0 from this viewpoint. — krvolok, Sep 24 '15 at 13:12
Regarding Legendre transforms and sameness of the equations of motion. Now, we know that adding a total differential doesn't change the equations of motion, so consider a 'new' Hamiltonian $K(P,Q,t)$ such that $p\mathrm d q - H(p,q,t)\mathrm d t = P\mathrm d Q - K(P,Q,t)\mathrm d t + d F(q,Q,t)$, where $P,Q$ are functions of $p,q$. Expanding the differential, we get: $(p-F_q)\mathrm d q -(P+F_Q)\mathrm dQ + (H-K - F_t)\mathrm d t = 0$. So, if we can find such an $F$, we have $\dot Q = K_P$, $\dot P = - K_Q$. These are canonical or symplectic transformations, from variational principles. — krvolok, Sep 24 '15 at 13:18
I just realized I had overlooked a dot. Why would one think of trying to minimize $\int G(\dot x(t))dt$ where $G$ is the Legendre transform of $H$? — MickG, Oct 01 '15 at 13:58

score 3 · Answer 2 · answered Sep 24 '15 at 19:28

Let there be given a curve $\vec{p}(t)$ and a real valued function $L$ with the following arguments: this curve, the time derivative of the curve and the time $t$ itself. Now consider the following integral: $$ W\left(\vec{p},\dot{\vec{p}}\right) = \int_{t_1}^{t_2} L\left(\vec{p},\dot{\vec{p}},t\right) dt $$ The integral $W$ is dependent upon the chosen curve $\vec{p}(t)$ ; it is a so-called functional of $\vec{p}$ and $\dot{\vec{p}}$.
We wish to determine for which curves $\vec{p}(t)$ the functional $W(\vec{p},\dot{\vec{p}})$ has an extreme value. But it is not possible to simply take a derivative and put it to zero. So we must do something else.
Assume that the extreme value is obtained for the curve $\vec{q}(t)$ and let $\vec{p}(t) = \vec{q}(t) + \epsilon \vec{f}(t)$ , with the same end-points $\vec{p}(t_1) = \vec{q}(t_1)$ and $\vec{p}(t_2) = \vec{q}(t_2)$, hence $\vec{f}(t_1) = \vec{f}(t_2) = \vec{0}$. For the rest, the curve $\vec{f}(t)$ is arbitrary and the variable $\epsilon$ does not depend on $t$. In this way our functional $W$ has become a common function of $\epsilon$: $$ W\left(\vec{p},\dot{\vec{p}}\right) = \int_{t_1}^{t_2} L\left(\vec{q} + \epsilon \vec{f},\dot{\vec{q}} + \epsilon \dot{\vec{f}},t\right) dt $$ A common real-valued function, which can be differentiated in the common way to find extremes: $$ \frac{dW}{d\epsilon} = 0 \qquad \Longleftrightarrow \qquad \int_{t_1}^{t_2} \frac{dL\left(\vec{q} + \epsilon \vec{f},\dot{\vec{q}} + \epsilon \dot{\vec{f}},t\right)} {d\epsilon} dt \quad \mbox{for} \; \epsilon = 0 $$ Let the (vector) components of $\,\vec{p}\,$ be $\,p_k$ . Apply the chain rule: $$ \frac{dL}{d\epsilon} = \sum_k \frac{\partial L}{\partial p_k} \frac{d p_k}{d\epsilon} + \sum_k \frac{\partial L}{\partial \dot{p}_k} \frac{d \dot{p}_k}{d\epsilon} + \frac{\partial L}{\partial t} \frac{d t}{d\epsilon} = \sum_k \frac{\partial L}{\partial p_k} f_k + \sum_k \frac{\partial L}{\partial \dot{p}_k} \dot{f}_k + 0 $$ The last term can be rewritten with help of: $$ \frac{d}{dt} \left\{ \frac{\partial L}{\partial \dot{p}_k} f_k\right\} = \frac{\partial L}{\partial \dot{p}_k} \dot{f}_k + \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{p}_k}\right) f_k $$ It follows that: $$ \frac{dW}{d\epsilon} = \sum_k \int_{t_1}^{t_2} \left[\frac{\partial L}{\partial p_k} f_k - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{p}_k}\right) f_k + \frac{d}{dt} \left\{ \frac{\partial L}{\partial \dot{p}_k} f_k\right\} \right] dt = \\ \sum_k \int_{t_1}^{t_2} \left[\frac{\partial L}{\partial p_k} - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{p}_k}\right) \right] f_k\, dt + \sum_k \left[ \frac{\partial L}{\partial \dot{p}_k} f_k \right]_{t_1}^{t_2} $$ But we know that $f_k(t_1) = f_k(t_2) = 0$ , so the last term is zero. Furthermore we know that, for $\epsilon = 0$ : $p_k = q_k$ , and the functions $f_k$ are completely arbitrary. Therefore it is concluded that the integral is zero if the folowing necessary conditions (not sufficient eventually) are fulfilled for all $k$ : $$ \frac{\partial L}{\partial q_k} - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{q}_k}\right) = 0 $$ These are the well known Euler-Lagrange equations. It is noted that special notations like $\delta W$ for the "variation" of the integral are not needed. The variation is reduced to common differentiation, which makes the (IMHO confusing) $\delta$ practice completely redundant.

I agree that the $\delta$ practice is confusing. – Giuseppe Negro Sep 28 '15 at 23:07 — Giuseppe Negro, Sep 28 '15 at 23:07

Intuition behind variational principle

2 Answers2

Linked