30

I've always used the method of Lagrange multipliers with blind confidence that it will give the correct results when optimizing problems with constraints. But I would like to know if anyone can provide or recommend a derivation of the method at physics undergraduate level that can highlight its limitations, if any.

  • @John, you may or may not find the answers to this similar question helpful: http://math.stackexchange.com/q/674/400 – Vladimir Sotirov Feb 27 '11 at 06:03
  • Relevant thread: https://math.stackexchange.com/questions/1760709/how-to-prove-lagrange-multiplier-theorem-in-a-rigorous-but-intuitive-way – littleO May 09 '25 at 07:10

3 Answers3

47

Lagrange multipliers are used to obtain the maximum of a function $f(\mathbf{x})$ on a surface $\{ \mathbf{x}\in\mathbb{R}^n\mid g(\mathbf{x}) = 0\}$ (I use "surface", but whether it is a 2-dimensional, 1-dimensional, or whatever-dimensional object will depend on the $g$ and the $\mathbb{R}^n$ we are dealing with).

The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$. If we want to find the largest value of $f$ along $g$, then we need the direction of greatest increase to be orthogonal to $g$; otherwise, moving along $g$ will "capture" some of that increase and $f$ will not achieve its maximum among $g$ at that point (this is akin to the fact that in one-variable calculus, the derivative should be $0$ at the maximum, otherwise moving a bit will increase in one direction will increase the value of the function).

In order for $\nabla f$ to be perpendicular to the surface, it must be parallel to the gradient of $g$; so $\nabla f$ must be a scalar multiple of $\nabla g$. So this amounts to finding a solution to the system \begin{align*} \nabla f(\mathbf{x}) &= \lambda \nabla g(\mathbf{x})\\ g(\mathbf{x}) &= 0 \end{align*} for both $\mathbf{x}$ and $\lambda$.

Added. Such a point is not guaranteed to be a maximum or a minimum; it could also be a saddle point, or nothing at all, much as in the one-variable case, points where $f'(x)=0$ are not guaranteed to be extremes of the function. Another obvious limitation is that if the surface $g$ is not differentiable (does not have a well-defined gradient) then you cannot even set up the system.

Arturo Magidin
  • 417,286
  • 2
    +1, although I would add, as a nod to the OP's request to "highlight its limitations," that not every solution to the system is guaranteed to be a maximum or minimum of $f$ on $g(\mathbf{x}) = 0$ (as with the single-variable case with the derivative being zero). – Mike Spivey Feb 27 '11 at 05:17
  • @Mike Good point. – Arturo Magidin Feb 27 '11 at 05:19
  • 1
    Very easy to understand, thanks. – John McVirgooo Feb 27 '11 at 05:20
  • An answer by @ArturoMagidin! I think I will upvote without reading it. And then I'll read it. – badatmath Nov 15 '12 at 00:33
  • Can you please elaborate on what you mean by "The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$."? Do you mean that $\nabla f$ points in the direction of the greatest increase for $f$ at the point where $f$ is maximal? – M Smith Dec 02 '15 at 16:29
  • Nice answer, but I think you have an extra copy of the words "will increase" – J. W. Tanner Feb 03 '20 at 01:02
  • 2
    @J.W.Tanner: Yes, but I'm not going to bump an 11-year-old question to correct a bit of grammar that does not obscure the meaning nor misleads the reader. – Arturo Magidin Feb 03 '20 at 01:31
23

An algebraic way of looking at this is as follows:

From an algebraic view point, we know how to find the extremum of a function of many variables. Say we want to find the extremum of $f(x_1,x_2,\ldots,x_n)$, we set the gradient to zero and look at the definiteness of the Hessian.

We would like to extend this idea, when we want to find the extremum of a function along with some constraints. Say the problem is: $$\begin{align} \text{Minimize }f(x_1,x_2,\ldots,x_n)\\\ \text{subject to: }g_k(x_1,x_2,\ldots,x_n) = 0\\\ \text{where }k \in \{1,2,\ldots,m\}\\\ \end{align} $$

If we find the extremum of $f$ just by setting the gradient of $f$ to zero, these extremum need not satisfy the constraints.

Hence, we would like to include the constraints in the previous idea. One way to it is as follows. Define a new function: $$F(\vec{x},\vec{\lambda}) = f(\vec{x}) - \lambda_1 g_1(\vec{x}) - \lambda_2 g_2(\vec{x}) - \cdots - \lambda_m g_m(\vec{x})$$ where $\vec{x} = \left[ x_1,x_2,\ldots,x_n \right], \vec{\lambda} = \left[\lambda_1,\lambda_2,\ldots,\lambda_m \right]$

Note that when the constraints are enforced, we have $F(\vec{x},\vec{\lambda}) = f(\vec{x})$ since $g_j(x) = 0$ when the constraints are enforced.

Let us find the extremum of $F(\vec{x},\vec{\lambda})$. This is done by setting $\frac{\partial F}{\partial x_i} = 0$ and $\frac{\partial F}{\partial \lambda_j} = 0$ where $i \in \{1,2,\ldots,n\}$ and $j \in \{1,2,\ldots,m\}$

Setting $\frac{\partial F}{\partial x_i} = 0$ gives us $$\vec{\nabla}f = \vec{\nabla}g \cdot \vec{\lambda}$$ where $\vec{\nabla}g = \left[\vec{\nabla} g_1(\vec{x}),\vec{\nabla} g_2(\vec{x}),\ldots,\vec{\nabla} g_m(\vec{x}) \right]$

Setting $\frac{\partial F}{\partial \lambda_j} = 0$ gives us $$g_j(x) = 0$$ where $j \in \{1,2,\ldots,m\}$

Hence, we find that when we find the extremum of $F$, the constraints are automatically enforced. This means that the extremum of $F$ corresponds to extremum of $f$ with the constraints enforced.

To decide, if the extremum is a minimum (or) maximum (or) if the point we obtain by solving the system is a saddle point, we need to look at the definiteness of the Hessian of $F$ and decide.

  • +1. The approach Sivaram describes here also leads to a notion of duality for nonlinear optimization problems and ultimately to the important Karush-Kuhn-Tucker conditions. – Mike Spivey Feb 27 '11 at 05:49
0

[This is a very heuristic explanation of Lagrange multipliers.]

Consider the optimization problem

$${ {\begin{aligned} &\, \text{minimize } \quad f(x) \\ &\, \text{subject to } \, \, \, \, H(x) = 0 \end{aligned}} }$$

where ${ f : \mathbb{R} ^n \longrightarrow \mathbb{R} }$ and ${ H : \mathbb{R} ^n \longrightarrow \mathbb{R} ^{\ell} }$ are smooth functions.

Suppose at every point in the feasible set ${ \lbrace x : H(x) = 0 \rbrace , }$ the gradients ${ \nabla H _1 (x), \ldots, \nabla H _{\ell} (x) }$ are linearly independent.
Hence by the implicit function theorem, near any point in the feasible set ${ \lbrace x : H(x) = 0 \rbrace , }$ there are ${ \ell }$ coordinates (of the feasible set) which can be expressed as smooth functions of the other ${ (n-\ell) }$ coordinates.

Suppose ${ x ^{\ast} }$ is a local minimizer of ${ f(x) }$ under the constraints ${ H(x) = 0 .}$

Consider perturbations ${ \Delta x , }$ such that ${ x ^{\ast} + \Delta x }$ stays approximately in the feasible set.
Equivalently, consider perturbations ${ \Delta x , }$ which lie in the tangential space to the feasible set ${ \lbrace x : H(x) = 0 \rbrace }$ at ${ x ^{\ast} . }$

For brevity, the tangential space to the feasible set ${ \mathscr{F} = \lbrace x : H(x) = 0 \rbrace }$ at ${ x ^{\ast} }$ is called ${ T _{x ^{\ast}} \mathscr{F} . }$
Note that intuitively ${ T _{x ^{\ast}} \mathscr{F} }$ is ${ (n-\ell) }$ dimensional.

Note that

  • ${ H(x ^{\ast} + \Delta x) \approx 0 }$ for all small ${ \Delta x }$ in ${ T _{x ^{\ast}} \mathscr{F} .}$
  • ${ f(x ^{\ast} + \Delta x) \approx f(x ^{\ast}) }$ for all small ${ \Delta x }$ in ${ T _{x ^{\ast}} \mathscr{F} . }$

Hence

  • ${ \nabla H _1 (x ^{\ast}) ^T \Delta x = 0, \ldots, \nabla H _{\ell} (x ^{\ast}) ^T \Delta x = 0 }$ for all small ${ \Delta x }$ in ${ T _{x ^{\ast}} \mathscr{F} . }$
  • ${ \nabla f(x ^{\ast}) ^T \Delta x = 0 }$ for all small ${ \Delta x }$ in ${ T _{x ^{\ast}} \mathscr{F} . }$

Note that ${ T _{x ^{\ast}} \mathscr{F} }$ is ${ (n - \ell) }$ dimensional, and the ${ \ell }$ linearly independent gradients ${ \nabla H _1 (x ^{\ast}), \ldots, \nabla H _{\ell} (x ^{\ast}) }$ are normal to ${ T _{x ^{\ast}} \mathscr{F} . }$

Hence

  • ${ \nabla H _1 (x ^{\ast}), \ldots, \nabla H _{\ell} (x ^{\ast}) }$ form a basis of ${ (T _{x ^{\ast}} \mathscr{F}) ^{\perp} .}$
  • ${ \nabla f(x ^{\ast}) \in (T _{x ^{\ast}} \mathscr{F}) ^{\perp} . }$

Hence there exist unique ${ \lambda _1 ^{\ast}, \ldots, \lambda _{\ell} ^{\ast} \in \mathbb{R} }$ such that

$${ \nabla f(x ^{\ast}) = \sum _{i=1} ^{\ell} \lambda _i ^{\ast} \nabla H _i (x ^{\ast}) .}$$

The ${ \lambda _i ^{\ast} }$s are called Lagrange multipliers.

Note that the necessary conditions for ${ x ^{\ast} }$ being a local minimizer

$${ {\begin{cases} \, \nabla f(x ^{\ast}) = \sum _{i=1} ^{\ell} \lambda _i ^{\ast} \nabla H _i (x ^{\ast}), \\ \, H(x ^{\ast}) = 0 \end{cases}} }$$

can be re expressed as:

The point ${ (x ^{\ast}, - \lambda ^{\ast}) }$ is a critical point of

$${ {\begin{aligned} &\, \mathcal{L} : \mathbb{R} ^n \times \mathbb{R} ^{\ell} \longrightarrow \mathbb{R}, \\ &\, \mathcal{L} (x, \lambda) := f(x) + \lambda ^T H(x). \end{aligned}} }$$

The function ${ \mathcal{L} }$ is called the Lagrangian.

Hence the critical points of the Lagrangian give the potential candidates for the local minimizers of the constrained optimization problem.