I've been trying to get a better understanding of how Runge-Kutta methods are derived by reading the explanations found for example in this and this answers. I am, however, a bit confused as to what kind of approximation exactly we are using for the integral.
Preliminaries and context
Following the notation of this answer, consider an initial value problem $x'(t)=f(t,x(t))$.
Given a step size $h$, the naive way to compute $x(t+h)$ would be to do the standard approximation $\int_t^{t+h}g\simeq hg'(t)$ with the function $t\mapsto f(t,x(t))$, thus obtaining: $$x(t+h)= x(t)+\int_t^{t+h}\!d\tau f(\tau,x(\tau)) \simeq x(t)+h[\underbrace{f_t(t,x(t))+f_x(t,x(t))f(t,x(t))}_{\equiv f_t+f_xf}].$$ I recon we do not do this, however, because we do not want an expression with $f_t$ or $f_x$. Fine. We then go and try a different approach, which is to write $$\int_t^{t+h}\!d\tau f(\tau,x(\tau))\simeq h\sum_{i=1}^N \omega_i f(t+\nu_i h,x(t+\nu_i h)),$$ for some yet do be determined coefficients $\nu_i$ and $\omega_i$. This still look fine, as it seems that we are simply going for a Newton-Cotes approximation of the integral. However, if we were actually doing a Newton-Cotes approximation, the $\omega_i$ coefficients would be independent by $f$, and only determined by the way we decided to partition the interval $(t,t+h)$ (that is, by the coefficients $\nu_i$).
So we are not doing Newton-Cotes, I guess because that would require us to know $x(t+\nu_i h)$, which we still don't know. Ok. We are instead trying a different kind of approximation for the integral, which consists in writing $$x(t+h)-x(t)=\int_t^{t+h}\!d\tau f(\tau,x(\tau))\simeq\sum_i \omega_i K_i,$$ with \begin{align} &K_1\equiv h f(t,x(t)), \\ &K_2\equiv h f(t+\alpha h,x(t)+\beta_1 K_1), \\ &K_3\equiv h f(t+\alpha'h,x(t)+\beta_1' K_1+\beta_2' K_2)), \end{align} and so on. We then Taylor-expand the $x(t+h)-x(t)$ term of the LHS and find the parameters that make the equation satisfied.
Actual question
I do not understand what sort of approximation is this. Why use this specific kind of structure for the $K_i$? Is there any intuition or justification behind this choice, apart from the mere fact that it works?