In class, we were taught that if a function $z=f(x,y)$ is differentiable at $(x_0,y_0)$, then first order partial derivatives $f_x(x_0,y_0)$ and $f_y(x_0,y_0)$ exist at that point, and it satisfies the "increment theorem" i.e. $\Delta z=f_x(x_0,y_0)\Delta x+f_y(x_0,y_0)\Delta y + \epsilon \Delta x+\eta\Delta y$ where $\epsilon \to 0, \, \eta\to0 $ as $\Delta x \to 0, \, \Delta y \to 0 $ which, as I understand, intuitively expresses that the function can be approximated locally by a linear function. However, as a computational check for differentiability, we were taught this equation:
$$\lim_{(h,k) \to (0,0)} \dfrac{f(x+h,y+k)-f_x(x_0,y_0)-f_y(x_0,y_0)-f(x_0,y_0)}{\sqrt{h^2+k^2}}=0$$
How do you go from the increment theorem to the above limit? I'm not very familiar with typical $\epsilon-\delta$ arguments (I'm an engineering student). Does this rely on them? I thought this might rely on the conversion of some multivariable limit from Cartesian to polar coordinates, but I can't figure out the details if that is the case.
Furthermore, I did some reading and apparently, there are two kinds of differentiability; weak (Gateaux) and strong (Frechet) where the latter implies the former. And, for certain functions where the first-order partial derivatives are discontinuous (like $f(x,y)=\dfrac{x^5+y^3}{x^4+y^2}$ or $f(x,y)=\dfrac{x^2y^3}{x^4+y^4}$), they are said to be "Gateaux differentiable" but not Frechet differentiable.
How does the above definition of differentiability fit into this framework? I think it is the definition of Gateaux differentiability simplified for two variable functions but I'm not sure. Also, the continuity of partial derivatives implies differentiability but not vice versa. Is there any proof of this claim? What are the common counterexamples i.e. differentiable functions with discontinuous first-order partial derivatives?
Lastly, how does differentiability in the Frechet sense, Gateaux sense, and in the sense of the above equation, cement the relationship between the directional derivative and gradient? We learned in class that the fundamental definition of the directional derivative is $$D_{\hat{u}}f(P)=\lim_{h\to0} \dfrac{f(x_0+hu_x,y_0+hu_y)-f(x_0,y_0)}{h}$$ For differentiable functions, this becomes $D_{\hat{u}}f(P)=\nabla f(P).\hat{u}$. I don't understand why the existence of first-order partial derivatives by itself is not a sufficient condition for this; why must the function also be differentiable?