If you're having trouble with the notation in differential geometry, my suggestion is to completely avoid Leibniz's notation (temporarily) and write out everything in completely precise notation, being careful with what the function is vs where it is being evaluated.
Let's first study what happens in $\Bbb{R}^n$ before moving on to the manifold case. Consider a differentiable map $F: \Bbb{R}^n \to \Bbb{R}$ and a differentiable curve $\lambda: \Bbb{R} \to \Bbb{R}^n$ (of course you can restrict everything to open subsets but I don't feel like introducing too many new letters for new domains). We can now form the composite function $F \circ \lambda:\Bbb{R} \to \Bbb{R}$, and ask what is the derivative at a point $u \in \Bbb{R}$. i.e what is $(F \circ \lambda)'(u)$? The answer is of course to use the chain rule; see Loomis and Sternberg, page $148$, Theorem $7.2$. The result is that
\begin{align}
(F \circ \lambda)'(u) &= DF_{\lambda(u)}\left[ \lambda'(u) \right]
\end{align}
This is the application of the linear transformation $DF_{\lambda(u)}:\Bbb{R}^n \to \Bbb{R}$ on the "velocity vector of the curve" $\lambda'(u) \in \Bbb{R}^n$. Or if you prefer matrices (which I do not), you can think of this as a matrix multiplication of the $1 \times n$ matrix $DF_{\lambda(u)}$ with the $n \times 1$ matrix (or column vector) $\lambda'(u)$. To write this in terms of partial derivatives, just recall what the entries of each matrix is:
\begin{align}
(F \circ \lambda)'(u) &= \sum_{i=1}^n (\partial_iF)_{\lambda(u)} \cdot (\lambda^i)'(u) \\
&= \sum_{i=1}^n (\partial_iF)_{\lambda(u)} \cdot (\text{pr}^i\circ\lambda)'(u) \tag{$*$}
\end{align}
where $\lambda(\cdot) = \left( \lambda^1(\cdot), \dots, \lambda^n(\cdot)\right)$, or said differently, $\lambda^i := \text{pr}^i \circ \lambda$ is the $i^{th}$ coordinate function of the curve $\lambda$ (here $\text{pr}^i(a^1, \dots a^n) := a^i$ is the function which assigns to each $n$-tuple, the $i^{th}$ entry). Also, the notation $(\partial_iF)_{\lambda(u)}$ means you first calculate the $i^{th}$ partial derivative $\partial_iF: \Bbb{R}^n \to \Bbb{R}$, and afterwards evaluate at $\lambda(u) \in \Bbb{R}^n$ .
By using a chart $(U,\phi)$, the manifold case reduces directly to the $\Bbb{R}^n$ case. We wish to compute $(f \circ \gamma)'(u)$. Well, just write this as $(f \circ \phi^{-1}) \circ (\phi \circ \gamma)$, so we're considering $F= f \circ \phi^{-1}$ and $\lambda = \phi \circ \gamma$. Now, using $(*)$, we get
\begin{align}
(f \circ \gamma)'(u) &= \sum_{i=1}^n \partial_i(f \circ \phi^{-1})_{(\phi \circ \gamma)(u)} \cdot \left( \text{pr}^i \circ \left(\phi \circ \gamma\right)\right)'(u)
\end{align}
This is the full answer written out in gory detail. Now, if you want to obtain the more familiar looking formula, you have to make some new definitions for the notation.
The first step is rather than calling the chart map $\phi$, we shall call it $x$; so $x:U \to x[U] \subset \Bbb{R}^n$ is the chart map. Next, we define $x^i := \text{pr}^i \circ x: U \to \Bbb{R}$. With this, the above formula reads
\begin{align}
(f\circ \gamma)'(u) &= \sum_{i=1}^n\partial_i(f \circ x^{-1})_{x(\gamma(u))} \cdot (x^i \circ \gamma)'(u).
\end{align}
The second step is to introduce the following short-hand notation:
\begin{align}
\dfrac{\partial f}{\partial x^i} := \left[ \partial_i(f \circ x^{-1})\right] \circ x
\end{align}
Or if I evaluate at $\gamma(u) \in M$,
\begin{align}
\dfrac{\partial f}{\partial x^i}\bigg|_{\gamma(u)} \equiv \dfrac{\partial f}{\partial x^i}(\gamma(u)) := \partial_i(f \circ x^{-1})_{x(\gamma(u))}
\end{align}
The first $\equiv$ means "same thing different notation" (it's just a matter of where you want to indicate the point of evaluation, so it's more of an aesthetic thing rather than mathematical thing), but the $:=$ means it's a definition. The RHS is an ordinary partial derivative of a function $\Bbb{R}^n \to \Bbb{R}$, so it is something we already know, but the LHS is a new convenient symbol which we define in order to mimic the classical notation as much as possible. With this, we can write $(*)$ as
\begin{align}
(f \circ \gamma)'(u) &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}\bigg|_{\gamma(u)} \cdot (x^i \circ \gamma)'(u)
\end{align}
Or if you insist on using Leibniz's notation, you can write this as
\begin{align}
\dfrac{d(f \circ \gamma)}{du}\bigg|_{u} &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}\bigg|_{\gamma(u)} \cdot \dfrac{d(x^i \circ \gamma)}{du}\bigg|_u
\end{align}
The final step to making things look very classical is to completely avoid writing the compositions with $\gamma$ (so, don't write $\circ \gamma$ anywhere), and completely suppress where everything is being evaluated. Then, we get the nice familiar looking formula
\begin{align}
\dfrac{df}{du} &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i} \cdot \dfrac{dx^i}{du}
\end{align}
This is the form of the chain rule which you wrote in your very first equation (though for some reason you have $\frac{\partial}{\partial u}$ instead of $\frac{d}{du}$).
Remark.
I highly recommend recommend you watch the series of Lectures on General Relativity by Frederic Schuller, and in particular this one. The first $6$ lectures provide such an amazing introduction to the language of smooth manifolds, tangent spaces, tangent bundles, vector fields, covector fields (one-forms). (Of course, you should watch as many as you can, but for the very basics of differential geometry, you should watch atleast the first 6).
EDIT: (in response to comment)
You have quite a few typos, and you've incorrectly applied the chain rule. Yes, there should be a $\text{pr}^j$ in the "numerator". The chain rule says
\begin{align}
\partial_i(f \circ \psi \circ \phi^{-1}) \bigg|_{\phi(p)} &= \partial_j(f \circ \tilde{\phi}^{-1})\bigg|_{\tilde{\phi}(\psi(p))} \cdot \partial_i(\text{pr}^j \circ \tilde{\phi} \circ \psi \circ \phi^{-1})\bigg|_{\phi(p)}
\end{align}
As I explained in the first and second bullet points above, if we instead call the chart maps on $M$ as $(U,x)$, and the chart on $M'$ as $(V,y)$, then based on how I defined the notation above, this equality can be written as
\begin{align}
\frac{\partial(f \circ \psi)}{\partial x^i}\bigg|_{p} = \frac{\partial f}{\partial y^j}\bigg|_{\psi(p)} \cdot \frac{\partial(y^j \circ \psi)}{\partial x^i}\bigg|_{p} \tag{$\sharp$}
\end{align}
This says exactly the same thing as the previous equality, simply because of how I defined the notation. I think this is a completely good notation (though takes a little time to get used to in order to reconcile this new definition involving charts vs how we ordinarily think of things). However, what I find absolutely terrible (atleast when first learning the subject) is to completely avoid the composition symbol $\circ \psi$ and write
\begin{align}
\dfrac{\partial f}{\partial x^i} &= \dfrac{\partial f}{\partial y^j}\cdot \dfrac{\partial y^j}{\partial x^i}
\end{align}
This is an abuse of notation because the $f$ appearing on the two sides of the equation mean different things, and also the $y^j$ has two different meanings, while in $(\sharp)$, we're not reusing the same symbol for two different purposes.