I had few questions about linear regression derivation.
SSE = Sum i=1toN (yi - bo - b1xi)^2
In above example, i simply found values bo and b1 where SSE is minimum by finding partial derivates of 'bo' and 'b1'. I had few questions about this:
I know (from calculus) that when we take first derivative w.r.t variable it could be the minima or the maxima. In case of linear regression, in most examples i saw that they assumed that the first degree derivative is the minima (to minimize error function). Never saw them taking 2nd degree derivative to confirm. Any reason why, or are those examples just incomplete?
Using gradient decent we can step by step find minima of a function. Why do we need gradient decent if i can just do what i did for linear regression (i.e. find partial derivates and get answers). Could someone site some examples (hopefully with links) where this wont work, and we will need gradient decent?
Thanks
Yes, the solution is (technically) incomplete. However, it isn't hard too show the Hessian to be positive definite, and you should give this a go.
Gradient descent is essentially a numerical method to find the minima to a function. In principle, you'd use this if you either didn't know what function you were going to minimise, or if there was no analytic solution possible, e.g. the minimum to $x\cos x$ over $[0,2\pi]$