(2) isn’t the right definition. It only makes sense if $(x,y)$ is a local coordinate system for $M$, in particular, they’re not necessarily the same $x,y$ appearing in $F(x,y,z)=0$. Where exactly did you see (2)?
Anyway, I know of four equivalent definitions of submanifolds of $\Bbb{R}^n$; see this question and my answer for the equivalence. Corresponding to these four definitions of submanifolds, one has corresponding descriptions for the tangent spaces.
Let’s start with a somewhat general but intuitive definition:
Definition.
Let $M$ be a $k$-dimensional embedded submanifold of $\Bbb{R}^n$ and $p\in M$. We define $T_pM$ to be the subset of $\Bbb{R}^n$ consisting of the set of all velocity vectors $\gamma’(0)$ of smooth curves $\gamma:(-\epsilon,\epsilon)\to \Bbb{R}^n$ such that $\text{image}(\gamma)\subset M$ and $\gamma(0)=p$.
In the more general case of abstract manifolds, we can’t speak of $\gamma’(0)$ directly as a limit of difference quotients, rather we need to work with equivalence classes of smooth curves. Anyway, this is just one extra layer of definitions, and won’t affect what the core of what I’m about to say below.
Theorem (Various Descriptions of Tangent Spaces).
Let $M$ be a $k$-dimensional embedded submanifold of $\Bbb{R}^n$ and let $p\in M$. Fix the following notation: suppose $U\subset \Bbb{R}^n$ is an open neighbourhood of $p$ such that we have the following:
Graph description of $M$: there is an open subset $A\subset\Bbb{R}^k$ and a smooth function $f:A\to\Bbb{R}^{n-k}$ such that $M\cap U=\text{graph}(f)$ (super strictly speaking I should assume there exists a permutation of the coordinates, $\sigma$, such that $\sigma[M\cap U]=\text{graph}(f)$). Suppose $a\in A$ is the point such that $p=(a,f(a))$.
Local level set: there is a smooth submersion $F:U\to\Bbb{R}^{n-k}$ and a $c\in\Bbb{R}^{n-k}$ such that $M\cap U=F^{-1}(\{c\})$.
Local slice chart: there is an open set $V\subset\Bbb{R}^n$ and a diffeomorphism $\Phi:U\to V$ such that $M\cap U=\Phi^{-1}[V\cap (\Bbb{R}^k\times\{0_{\Bbb{R}^{n-k}}\})]$.
Local parametrization: there is an open set $W\subset\Bbb{R}^k$ and a smooth map $\alpha:W\to\Bbb{R}^n$ such that $\text{image}(\alpha)=M\cap U$, $\alpha$ is an injective immersion, and $\alpha^{-1}:M\cap U\to W$ is continuous.
With notation as above, we have the following equalities for the tangent space $T_pM$:
\begin{align}
T_pM=\text{graph}(Df_a)=\ker(DF_p)=(D\Phi_p)^{-1}(\Bbb{R}^k\times\{0\})=\text{image}(D\alpha_{\alpha^{-1}(p)}).
\end{align}
In particular, $T_pM$ is a $k$-dimensional subspace of $\Bbb{R}^n$.
All these equalities should be intuitive:
- $T_pM=\text{graph}(Df_a)$ is just saying that since $M$ is locally a graph, it follows that the tangent space (i.e linear approximation to $M$ at $p$) is the graph of the linear approximation.
- $T_pM=\ker(DF_p)$ just says that since $M$ is locally a level set of $F$, the tangent space is the zero-level set of the linear approximation of $F$.
- $T_pM= (D\Phi_p)^{-1}(\Bbb{R}^k\times \{0\})$ just says that since $M$ is locally the preimage of an open subset of $\Bbb{R}^k$, the tangent space is the preimage of $\Bbb{R}^k$ under the linear approximation.
- $T_pM=\text{image}(D\alpha_{\alpha^{-1}(p)})$ just says that since $M$ is locally the image of $\alpha$, the tangent space is the image of the linear approximation of $\alpha$ (at the respective point $\alpha^{-1}(p)$).
Now, let’s outline the proof of the various equalities.
- the fact that $T_pM=\text{graph}(Df_a)$ is the easiest of all, and I leave the details to you. You simply have to consider curves of the form $t\mapsto (a+tv, f(a+tv))$ in the graph. This also shows that $\dim T_pM$ equals the dimension of the domain of $Df_a$, namely the dimension of $\Bbb{R}^k$, which is $k$.
- To show $T_pM=\ker(DF_p)$, observe that for any smooth curve $\gamma:(-\epsilon,\epsilon)\to M$ through $p$, if we shrink $\epsilon$ small enough then the image of $\gamma$ lies in $M\cap U$, and so $F\circ \gamma$ is constantly equal to $c$. Thus, $DF_p(\gamma’(0))=0$, showing that $\gamma’(0)\in \ker(DF_p)$, and thus $T_pM\subset\ker(DF_p)$. The former is a $k$-dimensional subspace as I mentioned above, while the latter is $k$-dimensional by the rank-nullity theorem and the fact that $DF_p$ is surjective (by hypothesis).
- To show $T_pM=(D\Phi_p)^{-1}(\Bbb{R}^k\times \{0\})$, just take any velocity vector $\gamma’(0)\in T_pM$. Then, by shrinking the domain $(-\epsilon,\epsilon)$ sufficiently, we again have that $\gamma$ has image in $M\cap U$, and thus $\Phi\circ\gamma$ has image in $V\cap (\Bbb{R}^k\times\{0\})$, and thus $(\Phi\circ\gamma)’(0)=D\Phi_p(\gamma’(0))\in\Bbb{R}^k\times\{0\}$ (since $(\Phi\circ\gamma)’(0)$ is a limit of difference quotients and $\Bbb{R}^k\times\{0\}$ is a closed subset of $\Bbb{R}^n$). This shows $T_pM\subset (D\phi_p)^{-1}(\Bbb{R}^k\times\{0\})$; finally since $D\Phi_p$ is an isomorphism, tbe latter is a $k$-dimensional space, so we have equality.
- Lastly, to show $T_pM=\text{image}(D\alpha_{\alpha^{-1}(p)})$, observe that for each $w\in\Bbb{R}^k$, by openness of $W$, there is an $\epsilon>0$ such that for all $|t|<\epsilon$, we have $\alpha^{-1}(p)+tw\in W$. So, $\gamma:(-\epsilon,\epsilon)\to \Bbb{R}^n$ defined as $\gamma(t):=\alpha(\alpha^{-1}(p)+tw)$ is a smooth curve in $M\cap U$ through the point $p$. Next, we have $\gamma’(0)=D\alpha_{\alpha^{-1}(p)}(w)$ by the chain rule, and so we see that $\text{image}(D\alpha_{\alpha^{-1}(p)})\subset T_pM$. Since $\alpha$ is an immersion, it means $D\alpha_{\alpha^{-1}(p)}:\Bbb{R}^k\to\Bbb{R}^n$ is an injective linear map, so the image has dimension $k$, and thus we have equality of the two spaces.
This proves the equality of all the descriptions of the tangent spaces to submanifolds.
Local Coordinate Description.
When talking about local coordinates for submanifolds, we’re really talking about the third definition, i.e what I called the local slice chart definition. Observe that because $\{e_1,\dots, e_k\}$ is a basis for $\Bbb{R}^k\times\{0\}\subset \Bbb{R}^n$ and $D\Phi_p$ is an isomorphism, it follows that $\{(D\Phi_p)^{-1}(e_i)\}_{i=1}^k$ forms a basis for $T_pM$.
For each $i\in\{1,\dots, k\}$, let us define $\xi^i:= \Phi^i|_{M\cap U}$, i.e the function $\xi^i$ is the $i^{th}$ component function of $\Phi:U\to V\subset\Bbb{R}^n$, restricted to $M\cap U$. Then, the collection $(M\cap U, (\xi^1,\dots,\xi^k))$ is said to comprise a local coordinate system for $M$ around $p$. If you know about abstract manifolds, you can easily verify that the collection of all such comprise a smooth atlas for $M$ (look up any good book for more details, e.g Lee’s book on smooth manifolds). Now, one introduces the following notation:
\begin{align}
\frac{\partial}{\partial \xi^i}(p):= (D\Phi_p)^{-1}(e_i)\in T_pM.\tag{$*$}
\end{align}
So, by my previous paragraph, one trivially has
\begin{align}
T_pM=\text{span}\left\{\frac{\partial}{\partial \xi^i}(p)\right\}_{i=1}^k.\tag{$**$}
\end{align}
Notice that the equality $(**)$ doesn’t have much content; it’s a simple consequence of the notation $(*)$ and the previously established fact that $T_pM=D\Phi_p^{-1}(\Bbb{R}^k\times\{0\})$. So, the real question is why one even introduces the notation $(*)$. For that you can read up Lee’s book or search the site because it’s been asked several times.
Finally, I should mention that you should observe very carefully that the functions $\xi^i$ are not the same as the coordinate functions $x^i$ on $\Bbb{R}^n$. It is probably your double usage of $x^i$ (or $(x,y)$) for both the coordinate functions on the ambient space and the submanifold which is confusing you. Bottom line is you should review the definitions carefully.