18

I am trying to study about optimization problems, Lagrange duality and related topics. I came across some presentation on the net, which claims to show the geometric interpretation of the duality and Slater's condition for a simple problem with only a single constraint :

$$\begin{equation*} \begin{aligned} & \underset{x}{\text{minimize}} & & f_0(x) \\ & \text{subject to} & & f_1(x) \leq 0. \end{aligned} \end{equation*}$$

Here is the following slide:

enter image description here

Now, I understand how $p^*$ is depicted: Primal problem has a constraint $f_1(x) \leq 0$ and we only consider negative $u$ values therefore. The point $(u,t)$ with the minimum $t$ value is picked where $u \leq 0$.

But I completely don't understand how I should interpret the dual function $g(\lambda)$ to begin with. $g(\lambda)$ is depicted as a line (hyperplane). But according to the definition of $g(\lambda)$ it must be a scalar value. The dual problem is $g(\lambda) = \inf_x(f_0(x) + \lambda f_1(x))$ where $\lambda \geq 0$. So, for a given $\lambda \geq 0$ we must go and seek a $(u,t)$ point in $G$ which minimizes $g(\lambda)$. How is this connected with a hyperplane to begin with? We are in $(u,t)$ space, which has no $\lambda$ parametrization in it. I direly need some clarifications here.

  • The dual function is $g(\lambda) = \inf_x (f_0(x) + \lambda f_1(x))$. The dual problem is to maximize $g(\lambda)$ subject to $\lambda \geq 0$. – littleO Feb 23 '15 at 04:25

3 Answers3

7

A key idea in convex analysis is to think of a set (such as $\mathcal G$) in terms of the half-spaces that contain it.

For a given $\lambda$, you could imagine all the hyperplanes of the form $\lambda u + t = \text{const}$ for which $\mathcal G$ is contained in the upper half space.

And what is the "best" choice of the constant on the right hand side?

The "best" choice is $g(\lambda)$, because that is the largest constant such that $\mathcal G$ is contained in the upper half space of $\lambda u + t = \text{const}$.

So, you can think of $\lambda u + t = g(\lambda)$ as being a hyperplane for which $\mathcal G$ is contained in upper half space. Moreover, for this value of $\lambda$, this is the "best" hyperplane, in the sense that the containment is as tight as possible. If you shifted this hyperplane up any higher, containment would be violated.

littleO
  • 54,048
  • why for a given $\lambda$ ? aren't i suppose to maximize lambda so it is a variable ? – Kong Mar 19 '18 at 10:36
  • @kong Before we worry about maximizing $g$, we should first understand visually the definition of $g$. We will maximize $g$ later. – littleO Mar 19 '18 at 10:48
  • Based on my understanding of what is written, G are the set of all points $f_1(x), f_0(x)$. So t = $f_1(x)$ and u = $f_0(x)$. That is all I understand. After that, I feel that the infimum should be over only t because that is what we are trying to minimize – Kong Mar 19 '18 at 11:16
  • 1
    @kong Well, while our ultimate goal is to minimize $f_0(x)$ subject to the given constraint, in this question we are merely trying to understand visually the definition of $g$. That's different. We'll worry about solving the optimization problem later. For now, we visualize $g$ by noting that $g(\lambda) := \inf_{x \in \mathcal D} f_0(x) + \lambda f_1(x)$, which is equal to $\inf_{(u,t) \in \mathcal G} t + \lambda u$, which suggests visualing hyperplanes as shown in the picture. – littleO Mar 19 '18 at 11:29
  • Thank you for helping me.The picture on the right shows finding d* by drawing a line (hyperplane). My main confusion arises from not knowing how I would even decide on that line. – Kong Mar 19 '18 at 11:57
  • 1
    @kong Good question. For any fixed value of $\lambda$, the hyperplane ${ (u,t) \mid \lambda u + t = g(\lambda) }$ lies below $\mathcal G$ (and this hyperplane just barely touches $\mathcal G$). By plugging in $u = 0$ you can see that this hyperplane intersects the vertical axis at the point $(0,g(\lambda))$. That's a good picture. Now, we are ready to try to visualize the dual problem. How should we pick $\lambda \geq 0$ to make $g(\lambda)$ as large as possible? Visually, we want our hyperplane to intersect the vertical axis at the highest possible point. This is pictured on the right. – littleO Mar 19 '18 at 12:05
  • 1
    Keep in mind that $-\lambda$ is the slope of the hyperplane ${(u,t) \mid \lambda u + t = g(\lambda) }$. When we try out different values of $\lambda$, visually we are adjusting the slope of this hyperplane. – littleO Mar 19 '18 at 12:14
  • Thank you ! It is much better now, but I have another possibly stupid question. What stops me from choosing a very large lambda to get d* > p* ? There seems to be no constraint that prevents the hyperplane from cutting through G ? – Kong Mar 19 '18 at 12:24
  • 1
    @kong That's a good question, it's a little tricky to get used to visualizing this situation. From the definition of $g$, we can see that for any fixed value of $\lambda$ we have $g(\lambda) = \inf_{(u,t) \in \mathcal G} t + \lambda u$. This fact implies that $g(\lambda) \leq t + \lambda u$ for all $(u,t) \in \mathcal G$. So, every point in the set $\mathcal G$ lies on or above the hyperplane ${(u,t) \mid \lambda u + t = g(\lambda)}$. In other words, this hyperplane can't cut through $\mathcal G$. Rather, this hyperplane just barely touches $\mathcal G$ from below. – littleO Mar 19 '18 at 17:46
5

See lecture 8 of Stanford course convex optimization. At time 1.04 (one hour and 4 minutes from start) Stephen Boyd start explaining the geometric interpretation.

http://www.youtube.com/watch?v=FJVmflArCXc

You also have his book (free from his web site) with more formal details.

George
  • 171
1

As you said $g(\lambda)$ for a given $\lambda$ is a scalar. Suppose $\lambda = 3$ and $g(\lambda) = 5$. That defines the line $3u+t=5$. The dual is $5$ (not $3u+t=5$). The intercept of $3u+t=5$ line is $5$ which is $g(\lambda)$. You get the dual function by varying the $\lambda$s (must be nonnegative). Therefore, the dual function defines a family of lines; find the largest intercept among all. That is $d^*$ which satisfies $d^* \leq p^*$ according to weak duality. In case there is more than one constraint you no longer have supporting lines but supporting hyperplanes.

In summary, the lines are not the duals. A dual (which is a scalar) helps to define the line.

Leila
  • 111