Derivation of the Lagrangian.

Question

I'm doing a constrained optimization problem, but I want to know how this equation is derived. I understand it is made up of the Lagrangian multiplier, the original equation, and the constraint, but I want some intuition on how it works. $$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda g(x,y).$$

Possible duplicate of https://math.stackexchange.com/questions/815566/how-to-interpret-lagrangian-function-specifically-not-lagrangian-multiplier and https://math.stackexchange.com/questions/3457970/why-not-define-the-lagrangian-like-this . This question is cleaner, but the answers there should also give you some insight. — Toby Bartels, Feb 02 '20 at 18:25

score 2 · Answer 1 · answered Feb 02 '20 at 18:16

2

The usual method of Lagrange multipliers requires solving the system $$\begin{cases} \nabla f = \lambda \nabla g \\ g(x,y) =0,\end{cases}$$right? All this information is captured in the Lagrangian $\mathcal{L}$ so defined, because $\mathcal{L}(x,y,0) = f(x,y)$ and $\nabla f = \lambda \nabla g$ together with $g(x,y)=0$ are equivalent to $\nabla\mathcal{L} = 0$. In this case I usually think of the Lagrangian as the function $f$ itself hit with a "$g$-penalty".

answered Feb 02 '20 at 18:16

Ivo Terek

80,301

@mbzht For why this system must be solved, see https://math.stackexchange.com/q/23958/323432 – projectilemotion Feb 02 '20 at 18:21
Why is g(x,y) equal to zero instead of the constant? – mbzht Feb 02 '20 at 18:35
I assume that your constraint is $g(x,y) = 0$, as usual. If it is not $0$, then you have to tell us. If the constraint if $g(x,y) = c$, then your Lagrangian is $$\mathcal{L}(x,y,\lambda) = f(x,y) -\lambda(g(x,y)-c).$$ – Ivo Terek Feb 02 '20 at 18:45
I still don't get it. How does that get to the equation? – mbzht Feb 02 '20 at 19:23

needmoremath · Answer 2 · 2020-02-02T19:30:36.293

For non-geometrical approach, here is some slightly different argument.

WANT: find critical point of $f(\vec{x})$ while $g(\vec{x})=0$.

let $\vec{x_0}$ be critical point of $f$ with $g(\vec{x_0})=0$.

For WANT, our small perturbation $ \vec{\delta x}$ satisfying $g(\vec{x_0}+\vec{\delta x})= 0$ must not change $f$ in linear order. (since $\vec{x_0}$ is critical point.)

Note that ( $\delta = \|\vec{\delta x} \|$)

$$g(\vec{x_0} + \vec{\delta x})=0\Rightarrow g(\vec{x_0}) + \nabla g \cdot\vec{\delta x}+O( \delta^2) = 0 \Rightarrow \boldsymbol{\nabla g \cdot\vec{\delta x}=O( \delta^2) \approx 0}$$

since f does not change in linear order (by perturbation), $$f(\vec{x_0} + \vec{\delta x})= f(\vec{x_0}) + O( \delta^2)\Rightarrow \boldsymbol{\nabla f \cdot\vec{\delta x} = O( \delta^2) \approx 0}$$

Thus, our criterion for WANT will be

$ \forall \,\vec{\delta x}: \nabla g \cdot\vec{\delta x} = 0 \,\,and\,\, |\delta|\ll 1 \Rightarrow \nabla f \cdot\vec{\delta x} = 0$.

Above proposition reduces to

$ \forall \,\vec{v}: \nabla g \cdot\vec{v} = 0 \Rightarrow \nabla f \cdot\vec{v} = 0\quad$ ($\vec{a} \cdot \vec{b} = 0$ implies $ \vec{a} \cdot k \vec{b}=0$)

Now, consider plane with normal vetor $\nabla g$. Then all $\vec{v}$ consist the plane. By above proposition, $\nabla f$ is $0$ or normal vector of the plane.

Finally, $\nabla f = \lambda \nabla g$ for some $\lambda$. ($\lambda = 0$ is the case that $\nabla f=0$, otherwise are the cases of $\nabla f$ is normal vector.)

Thus equation: $\begin{cases} \nabla f = \lambda \nabla g \\ g =0\end{cases}$, which is equivalent with $\nabla \mathcal{L} = 0 $, gives sufficient condition for critical points.

paulinho · Answer 3 · 2020-02-02T22:34:55.517

Ivo Terek has already shown how your formulation is equivalent to solving the system $$\begin{cases} \nabla f = \sum \lambda_i \nabla g_i \\ g_1, g_2, \cdots g_n = 0 \end{cases}$$ In other words, the gradient at an optimal (Lagrange point) must be a linear combination off the gradients of the constraints. This idea can be intuitively understood as follows.

We'll just consider the single-variable version for simplicity. Consider the function $z = f(x,y)$ and its $z$-level curves (that is, the level curves obtained by fixing $z$). Let's call a level curve $z = c$ feasible if there exists a solution $(x_c, y_c)$ to the constrained problem (no optimization yet) $$f(x_c,y_c) = c, \text{ }g(x_c, y_c) = 0 $$ for the single constraint $g$. Now assume that there is an optimal feasible point $\vec{\alpha} = (x^*, y^*)$ which yields $f(x^*, y^*) = c$ such that $\nabla f(x^*, y^*) \neq \lambda \nabla g(x^*, y^*)$. Consider "walking" along the constraint curve a small amount in the direction $\vec{d}$. The key idea is that for a smooth function $f$, in a small vicinity around $(x_c, y_c)$, it can be linearized as $f \approx f(x_c, y_c) + \nabla f \cdot (\delta x, \delta y)$, where $(\delta x, \delta y)$ is the displacement vector from $(x_c, y_c)$. If $\vec{d} \cdot \nabla f(x_c, y_c) > 0$ for some $\vec{d}$ along the constraint $g(x,y) = 0$, then to linear order $f(\vec{\alpha} + \vec{d}) = f(x_c, y_c) + \nabla f \cdot (\delta x, \delta y) > f(x_c,y_c)$. Similarly, $f(\vec{\alpha} - \vec{d}) = f(x_c, y_c) - \nabla f \cdot (\delta x, \delta y) > f(x_c,y_c) < f(x_c, y_c)$, which means that $(x_c, y_c)$ was neither a maximum or minimum to start with. Therefore, it must have been the case that $\nabla f \cdot (\delta x, \delta y) = 0$ for any displacement $(\delta x, \delta y)$ along the constraint. But the only such vector that has this property is the gradient of $g$ at $(x_c, y_c)$, and we thus have $\nabla f = \lambda \nabla g$ at an optimal point.

This idea can be generalized to multiple constraints and higher dimensions using the same intuition by "walking" along the constraint space. With multiple constraints, we can understand the gradients of the constraints as a basis for the vectors that are perpendicular to the constraints. A point $\vec{p}$ can be an optimal point only if the gradient of the objective function $f$ is perpendicular to the constraint space. Therefore, the gradient of $f$ must be some linear combination of the constraint gradients, as the principle of Lagrange Multipliers posits. Hope this helped!

EDIT: A little clarification on the quantity which we call the Lagrangian $\mathcal{L}$. Normally, in unconstrained optimization, we we want to optimize a function $f$, we find critical points, namely where $\nabla f = 0$. In constrained optimization, where we optimize under the constraint $g(x,y) = c$, however, it's clear that in general the equation $\nabla f = 0$ will not give us points of interest. We can ask the question, is there a some other function $\mathcal{L}$ - a function of $x$, $y$, and perhaps some other variables - that has critical points corresponding to the optimal points of the constrained optimization problem? The answer is yes, and the function $\mathcal{L}$, called the Lagrangian, has the form $$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda (g(x,y) - c)$$ actually correspond directly to the optimal points in the constrained optimization problem. Namely, having $\nabla \mathcal{L} = 0$ implies $\nabla f = \lambda \nabla g$ (and the reason why this equation finds optimal points is explained above).

EDIT 2: A little more motivation behind why we defined the Lagrangian to be so. Recall our objective: find a function $\mathcal{L}$ that is a function of $x$, $y$, and some other variable(s) $\lambda$ that has its critical points and critical values corresponding in $x$ and $y$ to the optimal points and optimal values of the constrained optimization problem. This means that first, we want if $(x, y)$ is an optimal point of the constrained optimization problem, then $(x, y, \lambda)$ is a critical point of $\mathcal{L}$ for some $\lambda$. The other condition can be accomplished if we stipulate that $\mathcal{L}(x, y, \lambda) = f(x, y)$ for every $(x, y)$ under the constraint $g(x, y) = c$. In this case, we will have encapsulated the constraint in the Lagrangian, because if we find a critical point $(x, y, \lambda)$ of the Lagrangian, $\mathcal{L}(x, y, \lambda) = f(x,y)$ will be exactly the value attained at the optimal point by $f$. It should be straightforward to verify that the Lagrangian we defined earlier satisfies both of these properties.

Hi paulinho, This is a bit too complicated for me. I understand that at ∇=∇, there is an optimal point. To my understanding, the Lagrangian encapsulates this equation and the constraint into one vector. But I am still confused on the process to get the Lagrangian equation. — mbzht, Feb 02 '20 at 21:32
This makes more sense. But, could you explain a bit more how ∇=∇ and(,)=c combine to make the Langrangian? — mbzht, Feb 02 '20 at 22:10

Derivation of the Lagrangian.

3 Answers3