Like mentioned in the comments, if you understand (1), then you understand (3), since (3) is just an application of (1) on a $2m$-dimensional space, with the coordinate functions labelled in a funny manner (sure there’s a deeper reason for the split, but abstractly, it’s the exact same idea).
Now, let me offer you the following explanation in terms of differential geometry since that’s what you seem to be after. Suppose you have as in (1), an open set $\Omega\subset\Bbb{R}^m$, a smooth (differentiable is enough really) function $f:\Omega\to\Bbb{R}$ (actually you can replace $\Bbb{R}$ with any Banach space on the target). Now, suppose you have a smooth curve (again differentiable is enough) $\gamma:I\to \Omega$, where $I$ is an open interval in $\Bbb{R}$ and let us use the notation $t$ to denote the coordinate on $I$. We already have the 1-form $df$, and now we can consider the pullback $\gamma^*(df)$. By directly using the formula (1), and the basic rules for pullback (additivity, commuting with exterior derivative etc), we see that
\begin{align}
\gamma^*(df)&=\gamma^*\left(\sum_{i=1}^m\frac{\partial f}{\partial x^i}\,dx^i\right)\\
&=\sum_{i=1}^m\left(\frac{\partial f}{\partial x^i}\circ\gamma\right)\,d(x^i\circ\gamma)\\
&=\sum_{i=1}^m\left(\frac{\partial f}{\partial x^i}\circ\gamma\right)\,(x^i\circ\gamma)’\,dt,
\end{align}
where in the last equal sign, I am using the exact same idea as (1), in the special case that $m=1$ and the open set $\Omega$ is actually the interval $I$, and the function $f$ is simply $x^i\circ\gamma$, the $i^{th}$ component of the curve $\gamma$. On the other hand, we can simplify the left side of this equation to get $\gamma^*(df)=d(\gamma^*f)=d(f\circ\gamma)=(f\circ\gamma)’\,dt$, so in other words,
\begin{align}
(f\circ\gamma)’\,dt&= \sum_{i=1}^m\left(\frac{\partial f}{\partial x^i}\circ\gamma\right)\,(x^i\circ\gamma)’\,dt.
\end{align}
This is really the content of equation (2). If you like, note that since $dt$ is a non-vanishing 1-form on a 1-dimensional domain it follows that
\begin{align}
(f\circ\gamma)’&= \sum_{i=1}^m\left(\frac{\partial f}{\partial x^i}\circ\gamma\right)\,(x^i\circ\gamma)’,
\end{align}
and this is exactly what equation (2) says, but this way of writing it is more precise. If one wants to be lazy, then proceed by abusing notation and avoid mentioning $\gamma$ anywhere, and use the typical Leibniz notation to recover the beloved equation $\frac{df}{dt}=\sum_{i=1}^m\frac{\partial f}{\partial x^i}\frac{dx^i}{dt}$ (here the equal sign really means the two things are equal if the person reading/writing them knows their true meaning). But anyway, as you can see from here, this is nothing but the chain rule, and the concept of pullback is nothing more (here anyway) than substituting things correctly.
You can also consider a higher-dimensional analogue of (2), when you pullback not by a curve $\gamma:I\to\Omega$, but by a differentiable function $g:\Omega’\to\Omega$, where $\Omega’$ is an open set in some other $\Bbb{R}^k$, say. See also Clarifying the chain rule terminology in differential geometry calculuations for further remarks.