In principle, you can always eliminate Lagrange multipliers. For example, if you want to find extrema of $f(x,y)$ under the constraint $g(x,y)=0$, solving two equations $f_x g_y-f_y g_x=0$ and $g=0$ will give you the same candidates as those given by the Lagrange multiplier method (in addition, the case of $\nabla g=0$, often forgotten by students when using the Lagrange multiplier method, is naturally included in this way). The reason is that $\nabla f$ and $\nabla g$ being linearly dependent is equivalent to $f_x g_y-f_y g_x=0$; it is also equivalent to the existence of $\lambda$ such that $\nabla f=\lambda \nabla g$ (or $\nabla g=0$).
There are many undergraduate problems for which this approach produces solutions with shorter calculations. Problems with linear constraints would be exceptions: for example, the maximization of $\sum_{j=1}^{n}x_j \log_2 x_j$ under the constraint $\sum_{j=1}^{n}x_j=1$ (and $x_j \geq 0$) is a little bit easier to solve using the Lagrange multiplier method.
Now, if we could in principle avoid the use of Lagrange multipliers, then why should we introduce them? One reason would be their importance in numerical analysis. I'm not a specialist in numerical analysis, but I can name two algorithms that use the idea of Lagrange multipliers: Augmented Lagrangian Method (https://en.wikipedia.org/wiki/Augmented_Lagrangian_method) and Primal-Dual Interior-Point Method (https://en.wikipedia.org/wiki/Interior-point_method). Of course, there are algorithms that avoid the use of Lagrange multipliers (simple Newton methods for example), and their comparison should be made on a case-by-case basis.
Edit: Another motivation might be its application to the calculus of variations. For example, consider the problem of finding a curve $y=y(x)\geq 0$ with $y(-1)=y(1)=0$ and a given arc length $L=\int_{-1}^{1}\sqrt{1+(dy/dx)^2}\, dx$ that maximizes the area $A=\int_{-1}^{1}y(x)\, dx$ between $y=0$ and $y=y(x)$ (Dido's problem). Applying the Lagrange multiplier method, we get an ODE
$$
y-\lambda \sqrt{1+(dy/dx)^2}+\frac{\lambda(dy/dx)^2}{\sqrt{1+(dy/dx)^2}}=k,
$$
where $\lambda$ is the Lagrange multiplier and $k$ is an integral constant (derivation: write down the Beltrami identity for the Lagrangian $L(y,dy/dx)=y-\lambda \sqrt{1+(dy/dx)^2}$). Solving the ODE with the boundary conditions $y(-1)=y(1)=0$ gives the curve $y(x)$ representing an arc of a circle, that is, $\{ x^2+(y-k)^2=\lambda^2,\, y\geq 0 \}$ with the relation $1+(y-k)^2=\lambda^2$. The arc length condition $L=\int_{-1}^{1}\sqrt{1+(dy/dx)^2}\, dx$ finally determines $\lambda$ and $k$. In this problem, it seems difficult for me to arrive at the conclusion by eliminating the Lagrange multiplier $\lambda$ before solving the ODE. In this sense, the introduction of the Lagrange multiplier seems essential.
Overall, to convince someone the importance of introducing the Lagrange multiplier, we could either argue its importance in (i) numerical analysis or in (ii) the calculus of variations. I guess (ii) is the main reason why the Lagrange multiplier method appears frequently in traditional textbooks (although the use of the Lagrange multiplier does not seem to be essential, even redundant in some cases, for solving the problems in these books) since the tradition seems to have started before the widespread use of computers. I know that both (i) and (ii) may be difficult for undergraduate students, but I could not find undergraduate level problems that one can solve by hand that explains the indispensability of "$\lambda$".