Mathematical formulation of Support Vector Machines?

Question

I'm trying to learn maths behind SVM (hard margin) but due to different forms of mathematical formulations I'm bit confused.

Assume we have two sets of points $\text{(i.e. positives, negatives)}$ one on each side of hyperplane $\pi$.
So the equation of the margin maximizing plane $\pi$ can be written as, $$\pi:\;W^TX+b = 0$$
- If $y\in$ $(1,-1)$ then,

$$\pi^+:\; W^TX + b=+1$$ $$\pi^-:\; W^TX + b=-1$$

Here $\pi^+$ and $\pi^-$ are parallel to plane $\pi$ and they are also parallel to each other. Now the objective would be to find a hyperplane $\pi$ which maximizes the distance between $\pi^+$ and $\pi^-$.

Here $\pi^+$ and $\pi^-$ are the hyperplanes passing through positive and negative support vectors respectively

According to wikipedia about SVM I've found that distance/margin between $\pi^+$ and $\pi^-$ can be written as, $$\hookrightarrow\frac{2}{||w||}$$
Now if I put together everything this is the constraint optimization problem we want to solve, $$\text{find}\;w_*,b_* = \underbrace{argmax}_{w,b}\frac{2}{||w||} \rightarrow\text{margin}$$ $$\hookrightarrow \text{s.t}\;\;y_i(w^Tx\;+\;b)\;\ge 1\;\;\;\forall\;x_i$$

Before proceeding to my doubts please do confirm if my understanding above is correct? If you find any mistakes please do correct me.

How to derive margin between $\pi^+$ and $\pi^-$ to be $\frac{2}{||w||}?$ I did find a similar question asked here but I couldn't understand the formulations used there? If possible can anyone explain it in the formulation I used above?
How can $y_i(w^Tx+b)\ge1\;\;\forall\;x_i$?

Fatemeh Asgarinejad · Accepted Answer · 2019-07-13T20:23:39.073

Your understandings are right.

deriving the margin to be $\frac{2}{|w|}$

we know that $w \cdot x +b = 1$

If we move from point z in $w \cdot x +b = 1$ to the $w \cdot x +b = 0$ we land in a point $\lambda$. This line that we have passed or this margin between the two lines $w \cdot x +b = 1$ and $w \cdot x +b = 0$ is the margin between them which we call $\gamma$

For calculating the margin, we know that we have moved from z, in opposite direction of w to point $\lambda$. Hence this margin $\gamma$ would be equal to $z - margin \cdot \frac{w}{|w|} = z - \gamma \cdot \frac{w}{|w|} =$ (we have moved in the opposite direction of w, we just want the direction so we normalize w to be a unit vector $\frac{w}{|w|}$)

Since this $\lambda$ point lies in the decision boundary we know that it should suit in line $w \cdot x + b = 0$ Hence we set is in this line in place of x:

$$w \cdot x + b = 0$$ $$w \cdot (z - \gamma \cdot \frac{w}{|w|}) + b = 0$$ $$w \cdot z + b - w \cdot \gamma \cdot \frac{w}{|w|}) = 0$$ $$w \cdot z + b = w \cdot \gamma \cdot \frac{w}{|w|}$$ we know that $w \cdot z +b = 1$ (z is the point on $w \cdot x +b = 1)$ $$1 = w \cdot \gamma \cdot \frac{w}{|w|}$$ $$\gamma= \frac{1}{w} \cdot \frac{|w|}{w} $$ we also know that $w \cdot w = |w|^2$, hence: $$\gamma= \frac{1}{|w|}$$ Why is in your formula 2 instead of 1? because I have calculated the margin between the middle line and the upper, not the whole margin.

How can $y_i(w^Tx+b)\ge1\;\;\forall\;x_i$?

We want to classify the points in the +1 part as +1 and the points in the -1 part as -1, since $(w^Tx_i+b)$ is the predicted value and $y_i$ is the actual value for each point, if it is classified correctly, then the predicted and actual values should be same so their production $y_i(w^Tx+b)$ should be positive (the term >= 0 is substituded by >= 1 because it is a stronger condition)

The transpose is in order to be able to calculate the dot product. I just wanted to show the logic of dot product hence, didn't write transpose

For calculating the total distance between lines $w \cdot x + b = -1$ and $w \cdot x + b = 1$:

Either you can multiply the calculated margin by 2 Or if you want to directly find it, you can consider a point $\alpha$ in line $w \cdot x + b = -1$. then we know that the distance between these two lines is twice the value of $\gamma$, hence if we want to move from the point z to $\alpha$, the total margin (passed length) would be: $$z - 2 \cdot \gamma \cdot \frac{w}{|w|}$$ then we can calculate the margin from here.

derived from ML course of UCSD by Prof. Sanjoy Dasgupta

Mathematical formulation of Support Vector Machines?

1 Answers1

Linked