1

I am reviewing the method of Lagrange multiplier and this time it strikes me as to why don't we just eliminate the multiplier $\lambda$ once and for all and just work with the remaining equations - since we are (mostly) only interested in locating the points at which extrema occur. I believe that, for most purposes, it is safe to assume that $\lambda$ does not vanish (see this for example), or even if we want to be safe, we only need to check that particular occurrence. So, instead of solving for $\lambda$ and plugging it back into the equations to compute the $x, y, z$, we could have eliminated the need to go through $\lambda$ to get $x, y, z$.

So, for instance, for a two-variable situation, we might want to recast the equations as

$\frac{f_x(x, y)}{f_y(x, y)}=\frac{g_x(x, y)}{g_y(x, y)}$.

I figure that there might be some difficulties with this approach since many sample solutions I see involve the computation of $\lambda$, but what are they?

tzy
  • 507
  • For some problems, the values of the multipliers have significance, for others, they don’t. If you don’t need to know those values, and can eliminate those additional variables from the system, by all means do so. For that matter, many of the examples and exercises presented when learning this method can be solved, often more easily, in other ways. – amd Dec 27 '19 at 19:22
  • What are the other methods? Examples I mean. – tzy Dec 28 '19 at 08:18
  • A common early exercise is to optimize some linear function constrained to a quadric. That can be solved by finding tangent hyperplanes parallel to the level surfaces of the linear function. Another is finding the distance to some quadric, which can often be recast as an eigenvalue problem. If there are multiple constraints and one of them is linear, that can be used immediately to eliminate a variable and avoid introducing an extraneous multiplier. And so on. – amd Dec 29 '19 at 21:07
  • 1
    Also, many problems given as exercises in Lagrange multipliers can be more easily solved using the AM-GM or Cauchy-Schwarz inequalities. – amd Jan 07 '20 at 00:54

3 Answers3

1

Lagrange Multipliers say that, given that $f$ is a function you are trying to minimize/maximize, and $g$ is a constraint, you can use the clever $\nabla g=\lambda\nabla f$ to create a system of equations for $x$, $y$, $z$, and $\lambda$. With as many equations as variables, one can solve for $x$, $y$, and $z$, with the quickest ways to do so sometimes requiring solving for $\lambda$ first.

If I am understanding your question correctly, you are asking why we cannot just set $\nabla g=\nabla f$ (functionally having $\lambda=1$) and thus ignore the $\lambda$. The issue is that it misunderstands the motivation for the method of Lagrange Multipliers; the method does not work without the nonzero $\lambda$ scaling factor.

The method guarantees that $f$ is maximized/minimized where $\nabla f$ is parallel to $\nabla g$, since the smallest/largest level curve/surface of $f$ should just barely touch $g$, and so their gradients should be parallel. It does not guarantee that these gradients should be equal, and so we cannot assume the gradients have equal magnitude or direction. The $\lambda$ is necessary in that sense.

  • No, I am not saying that we should set $\lambda=1$. I am saying that, for instance, to maximize $z=f(x, y)$ subjected to the constraint $g(x, y)=0$, using the constraint and the Lagrange equations, we have three equations in three unknowns. One of the unknowns, $\lambda$, is not of interest, so why don't we just eliminate it once and for all and deal with two equations in two unknowns. – tzy Dec 27 '19 at 08:40
  • Like with any system of equations, sometimes it is possible to ignore a variable, while most of the time it is not. Since $\lambda$ often shows up in all the equations in the system, you cannot ignore it.Take, for example, $f(x,y)=xy$ with the constraint of $x^2 + y^2=1$, the system is (1) $\lambda y=2x$, (2) $\lambda x=2y$, and (3) $x^2 + y^2 =1$. You can eliminate $\lambda$ quickly here, although notice that it appears in more than two equations: it cannot be treated as a system of two variables. It is true that often one can eliminate $\lambda$ early on, but rarely can it be ignored. – A. Khoja Dec 27 '19 at 19:37
  • $\lambda$ only appears twice - in (1) and (2). And I'd have eliminated $\lambda$ right away. Would you? – tzy Dec 28 '19 at 08:17
1

The idea behind finding extrema of $f(v)$ with constraints $g(v)=0$ is that you are excluding points in which $\nabla f$ and $\nabla g$ is pointing in different directions. The remaining points are: points in which $\nabla f$ or $\nabla g$ doesn't exist, points in which they both exist and at least 1 is $0$, and points in which they both exist and is nonzero and point in the same direction (note: same direction include what's geometrically opposite direction, so we're talking about non-oriented direction). These should be exceptional points, few enough that you can check by hand.

For normal calculus problem, $\nabla f$ and $\nabla g$ always exist and $\nabla g$ is always nonzero on the constraint, so you're just looking for points in which $\nabla f=\lambda\nabla g$ for some $\lambda$.

Now of course, the method is also phrased as $\nabla g=\lambda\nabla f$. This is bad because it misses out the case where $\nabla f=0$. If it is phrased this way, however, you can assume $\lambda!=0$, not because of any mathematical reasons, but because calculus exercise won't give you an example where $\nabla g=0$ on the constraint.

But it is possible to just do something else instead. For example, if these are function on a plane, then each gradient is a 2-dimensional vector. In this case, to check if one is the multiple of another, you could compute the determinant between them to see if you get $0$. Similarly, in 3-dimension you can perform the cross product.

Just for completeness sake, here is an example with $\nabla g=0$.

Find the minimum of $f(x,y)=x^{2}+y^{2}$ with constraint $y^{2}+2y+1=0$

  • I understand the concept, but my question was why we bother to compute the values of $\lambda$ when we're really only interested in the $x, y, z$? – tzy Dec 27 '19 at 17:56
  • @BenjaminT:We don't need to compute $\lambda$ at all. It's either computed as an intermediate step that is easier to solve before solving for the point; or depends on specific problem, it might have an additional interpretation beyond being just a multiplier. But as far as constrained optimization goes, it's completely irrelevant to compute it. Which is why I also mention the determinant and cross product. – calcstudent Dec 27 '19 at 18:02
  • Yes, I think that's the point. I get that sometimes solving for $\lambda$ can make life easier, but for a two-variable case it's very easy to eliminate $\lambda$ and focus on working with just $x, y$. Makes life easier, but I see a lot of samples that don't do so and I'm just wondering why, whether there's any special reason I'm not aware of. – tzy Dec 28 '19 at 08:20
1

In principle, you can always eliminate Lagrange multipliers. For example, if you want to find extrema of $f(x,y)$ under the constraint $g(x,y)=0$, solving two equations $f_x g_y-f_y g_x=0$ and $g=0$ will give you the same candidates as those given by the Lagrange multiplier method (in addition, the case of $\nabla g=0$, often forgotten by students when using the Lagrange multiplier method, is naturally included in this way). The reason is that $\nabla f$ and $\nabla g$ being linearly dependent is equivalent to $f_x g_y-f_y g_x=0$; it is also equivalent to the existence of $\lambda$ such that $\nabla f=\lambda \nabla g$ (or $\nabla g=0$).

There are many undergraduate problems for which this approach produces solutions with shorter calculations. Problems with linear constraints would be exceptions: for example, the maximization of $\sum_{j=1}^{n}x_j \log_2 x_j$ under the constraint $\sum_{j=1}^{n}x_j=1$ (and $x_j \geq 0$) is a little bit easier to solve using the Lagrange multiplier method.

Now, if we could in principle avoid the use of Lagrange multipliers, then why should we introduce them? One reason would be their importance in numerical analysis. I'm not a specialist in numerical analysis, but I can name two algorithms that use the idea of Lagrange multipliers: Augmented Lagrangian Method (https://en.wikipedia.org/wiki/Augmented_Lagrangian_method) and Primal-Dual Interior-Point Method (https://en.wikipedia.org/wiki/Interior-point_method). Of course, there are algorithms that avoid the use of Lagrange multipliers (simple Newton methods for example), and their comparison should be made on a case-by-case basis.

Edit: Another motivation might be its application to the calculus of variations. For example, consider the problem of finding a curve $y=y(x)\geq 0$ with $y(-1)=y(1)=0$ and a given arc length $L=\int_{-1}^{1}\sqrt{1+(dy/dx)^2}\, dx$ that maximizes the area $A=\int_{-1}^{1}y(x)\, dx$ between $y=0$ and $y=y(x)$ (Dido's problem). Applying the Lagrange multiplier method, we get an ODE $$ y-\lambda \sqrt{1+(dy/dx)^2}+\frac{\lambda(dy/dx)^2}{\sqrt{1+(dy/dx)^2}}=k, $$ where $\lambda$ is the Lagrange multiplier and $k$ is an integral constant (derivation: write down the Beltrami identity for the Lagrangian $L(y,dy/dx)=y-\lambda \sqrt{1+(dy/dx)^2}$). Solving the ODE with the boundary conditions $y(-1)=y(1)=0$ gives the curve $y(x)$ representing an arc of a circle, that is, $\{ x^2+(y-k)^2=\lambda^2,\, y\geq 0 \}$ with the relation $1+(y-k)^2=\lambda^2$. The arc length condition $L=\int_{-1}^{1}\sqrt{1+(dy/dx)^2}\, dx$ finally determines $\lambda$ and $k$. In this problem, it seems difficult for me to arrive at the conclusion by eliminating the Lagrange multiplier $\lambda$ before solving the ODE. In this sense, the introduction of the Lagrange multiplier seems essential.

Overall, to convince someone the importance of introducing the Lagrange multiplier, we could either argue its importance in (i) numerical analysis or in (ii) the calculus of variations. I guess (ii) is the main reason why the Lagrange multiplier method appears frequently in traditional textbooks (although the use of the Lagrange multiplier does not seem to be essential, even redundant in some cases, for solving the problems in these books) since the tradition seems to have started before the widespread use of computers. I know that both (i) and (ii) may be difficult for undergraduate students, but I could not find undergraduate level problems that one can solve by hand that explains the indispensability of "$\lambda$".

Kai
  • 131
  • Thanks for your contribution! I most certainly like what you pointed out, that we put it in the form $f_xg_y-f_yg_x=0$ and solve - and we don't have to deal with the case $\nabla g=0$ separately. Students do tend to forget about this case or think that we're taking being too pedantic, which we aren't. By the way, my original question was why bother going through the redundant step of the Lagrange multiplier at all in a basic Calculus course - I guess it's gotta do with numerical analysis, as you mentioned. – tzy Jun 15 '22 at 04:34
  • 1
    Thanks for the comment. I added another argument involving the calculus of variations. I was (and is still) asking the same question when I was preparing my course in Calculus. Every problem I found in textbooks could be solved without the use of Lagrange multipliers (and in many cases with fewer calculations). I was surprised that I never thought about this when I was an undergraduate student. Seneca was right about this: by teaching, we learn. – Kai Jun 16 '22 at 10:32