I was learning about support vector machines from MIT OpenCourseWare. I figured it out. I understand why we try to minimize $\frac{1}{2} w^2$. I just did not get why we try to maximize Lagrange expression like said at 35:56 in the YouTube video.
Margin (width) of the support vector is $2/\|w\|$. We want to maximize width it means that we also want to minimize $\|w\|$ from the equation $2/\|w\|$. It is true that we also want to minimize $\frac{1}{2}\|w\|^2$. Now we use Lagrange expression here, $$L = \frac{1}{2}\|w\|^2 + \sum_i α_i(y_i(w \bullet x+b)-1).$$ To find the minimum of $\frac{1}{2}\|w\|^2$, we apply gradient; $∇L = 0$. From the $∇L = 0$ equation, we get $Σ_iα_iy_i = 0$ and $w = Σ_iαi_iy_ix_i$. Then by using $Σ_iα_iy_i = 0$ and $w = Σ_iα_iy_ix_i$ equations and putting this in the Lagrange expression we end up with $$L(w,b,α) = \sum_iα_i-\frac{1}{2}\sum_{i,j}y_iy_jα_iα_j(x_i\bullet x_j).$$
I really understand up to here, but the professor said that we want to maximize $L$ here, why? I really don't understand. Why are we trying to maximize the Lagrange expression? I thought we were trying to minimize it.