Solving least squares with partial derivatives.

Question

Let's say we want to solve a linear regression problem by choosing the best slope and bias with the least squared errors. As example, let the points be $x=[1,2,3]$ and $y=[1,2,2]$.

To solve this problem in linear algebra, I would find the orthogonal projection of points in the column space: $p=Ax = (A^TA)^{-1} A^Tb$. You can see solution here.

Although, quadratic minimization problem can also be solved by using partial derivatives and finding the local minimum of the error function.

These steps are displayed in this answer, although I wasn't able to understand the mechanism.

I know that minimization problem can be represented as $||Ax-b||^2$ if $A$ and $x$ are basis for the column space and $x$ represents unknown linear coefficients (since we are working on linear function) and $b$ is the output of the function. Thus $Ax-b$ represents an error between "predicted" value and actual value.

In the answer linked above, author states that two partial derivatives must be solved:

$\frac{\partial}{\partial x_1}||Ax-b||^2 = 0$ and $\frac{\partial}{\partial x_2}||Ax-b||^2 = 0$.

Suppose we have n ai datapoints, then if $x_1$ represents the slope of the function and $x_2$ represents bias, we would have:

$\frac{\partial}{\partial x_1}||Ax-b||^2 = 2\sum_{i=1}^{n}a_i(x_1a_i+x_2-b_i) = 0$

and

$\frac{\partial}{\partial x_2}||Ax-b||^2 = 2\sum_{i=1}^{n}(x_1a_i+x_2-b_i) = 0$

Which finally implies that:

$x_1\sum_{i=1}^{n}a_i(x_1a_i+x_2-b_i)+x_2\sum_{i=1}^{n}(x_1a_i+x_2-b_i) = 0$

$\implies \sum_{i=1}^{n} (x_1a_i+x_2)(x_1a_i+x_2-b_i)=0=Ax\cdot (Ax-b)$

This solution leaves some concerns.

Why are there two derivatives for different x's?

Why are partial derivatives used (is it because we are working in higher dimensions?)

Why is there constant 2 in the front of summation $\frac{\partial}{\partial x_1}||Ax-b||^2 = 2\sum_{i=1}^{n}a_i(x_1a_i+x_2-b_i) = 0$ (isn't this normal derivative power rule?)

Shortly, How could we solve minimization problem with partial derivatives?

Thank you!

score 1 · Answer 1 · answered May 03 '18 at 09:44

1

Hint: I think here (unliess I'm grossly misunderstanding you) $$||Ax-b||^2=\text{tr}(Ax-b)^T(Ax-b)$$ in which case

\begin{align} \frac{\partial}{\partial x_i} \text{tr}(Ax-b)^T(Ax-b)&=\frac{\partial}{\partial x_i} (x^TA^T-b^T)(Ax-b) \\ &= \frac{\partial}{\partial x_i}(x^TA^TAx-b^TAx-x^TA^Tb+b^2)\\ &= \ldots \end{align}

answered May 03 '18 at 09:44

Hello, thank you for your answer. Could you explain steps of your solution? – ShellRox May 03 '18 at 09:57
@ShellRox The few lines above are just basic matrix algebra in the application of multiplication and transpose. – May 03 '18 at 10:10
From solution in question I somehow got constant of 2 in the front and then summation behind it. which finally equated to $Ax \cdot (Ax-b)$. Are these solutions similar? Because this equation equates to 0 since the error vector is orthogonal to all elements in column space. – ShellRox May 03 '18 at 10:20
Also, you have used "tr" term in your question, does it stand for transpose? – ShellRox May 03 '18 at 10:20

Solving least squares with partial derivatives.

1 Answers1