4

I'm trying to learn maths behind SVM (hard margin) but due to different forms of mathematical formulations I'm bit confused.

  • Assume we have two sets of points $\text{(i.e. positives, negatives)}$ one on each side of hyperplane $\pi$.

  • So the equation of the margin maximizing plane $\pi$ can be written as, $$\pi:\;W^TX+b = 0$$

    • If $y\in$ $(1,-1)$ then,

$$\pi^+:\; W^TX + b=+1$$ $$\pi^-:\; W^TX + b=-1$$

  • Here $\pi^+$ and $\pi^-$ are parallel to plane $\pi$ and they are also parallel to each other. Now the objective would be to find a hyperplane $\pi$ which maximizes the distance between $\pi^+$ and $\pi^-$.

Here $\pi^+$ and $\pi^-$ are the hyperplanes passing through positive and negative support vectors respectively

  • According to wikipedia about SVM I've found that distance/margin between $\pi^+$ and $\pi^-$ can be written as, $$\hookrightarrow\frac{2}{||w||}$$

  • Now if I put together everything this is the constraint optimization problem we want to solve, $$\text{find}\;w_*,b_* = \underbrace{argmax}_{w,b}\frac{2}{||w||} \rightarrow\text{margin}$$ $$\hookrightarrow \text{s.t}\;\;y_i(w^Tx\;+\;b)\;\ge 1\;\;\;\forall\;x_i$$

Before proceeding to my doubts please do confirm if my understanding above is correct? If you find any mistakes please do correct me.

  • How to derive margin between $\pi^+$ and $\pi^-$ to be $\frac{2}{||w||}?$ I did find a similar question asked here but I couldn't understand the formulations used there? If possible can anyone explain it in the formulation I used above?
  • How can $y_i(w^Tx+b)\ge1\;\;\forall\;x_i$?
Anjith
  • 961
  • 2
  • 11
  • 20

1 Answers1

7

Your understandings are right.

deriving the margin to be $\frac{2}{|w|}$

we know that $w \cdot x +b = 1$

If we move from point z in $w \cdot x +b = 1$ to the $w \cdot x +b = 0$ we land in a point $\lambda$. This line that we have passed or this margin between the two lines $w \cdot x +b = 1$ and $w \cdot x +b = 0$ is the margin between them which we call $\gamma$

For calculating the margin, we know that we have moved from z, in opposite direction of w to point $\lambda$. Hence this margin $\gamma$ would be equal to $z - margin \cdot \frac{w}{|w|} = z - \gamma \cdot \frac{w}{|w|} =$ (we have moved in the opposite direction of w, we just want the direction so we normalize w to be a unit vector $\frac{w}{|w|}$)

Since this $\lambda$ point lies in the decision boundary we know that it should suit in line $w \cdot x + b = 0$ Hence we set is in this line in place of x:

$$w \cdot x + b = 0$$ $$w \cdot (z - \gamma \cdot \frac{w}{|w|}) + b = 0$$ $$w \cdot z + b - w \cdot \gamma \cdot \frac{w}{|w|}) = 0$$ $$w \cdot z + b = w \cdot \gamma \cdot \frac{w}{|w|}$$ we know that $w \cdot z +b = 1$ (z is the point on $w \cdot x +b = 1)$ $$1 = w \cdot \gamma \cdot \frac{w}{|w|}$$ $$\gamma= \frac{1}{w} \cdot \frac{|w|}{w} $$ we also know that $w \cdot w = |w|^2$, hence: $$\gamma= \frac{1}{|w|}$$ Why is in your formula 2 instead of 1? because I have calculated the margin between the middle line and the upper, not the whole margin.

  • How can $y_i(w^Tx+b)\ge1\;\;\forall\;x_i$?

We want to classify the points in the +1 part as +1 and the points in the -1 part as -1, since $(w^Tx_i+b)$ is the predicted value and $y_i$ is the actual value for each point, if it is classified correctly, then the predicted and actual values should be same so their production $y_i(w^Tx+b)$ should be positive (the term >= 0 is substituded by >= 1 because it is a stronger condition)

The transpose is in order to be able to calculate the dot product. I just wanted to show the logic of dot product hence, didn't write transpose


For calculating the total distance between lines $w \cdot x + b = -1$ and $w \cdot x + b = 1$:

Either you can multiply the calculated margin by 2 Or if you want to directly find it, you can consider a point $\alpha$ in line $w \cdot x + b = -1$. then we know that the distance between these two lines is twice the value of $\gamma$, hence if we want to move from the point z to $\alpha$, the total margin (passed length) would be: $$z - 2 \cdot \gamma \cdot \frac{w}{|w|}$$ then we can calculate the margin from here.

derived from ML course of UCSD by Prof. Sanjoy Dasgupta

Fatemeh Asgarinejad
  • 1,198
  • 1
  • 10
  • 18