As far as I have been able to tell, there isn't a rigorous rational for the Padé approximant. The most recent work that I found about these approximants are the two books written by Baker
- Padé Approximants (Encyclopedia of Mathematics and its Applications)
- Essentials of Padé Approximants
I haven't found any paper in the archive which answers why Padé approximants are so effective. Most people use Padé approximants inside numerical solvers and observe that the results are better than truncated Taylor series.
A Padé approximant is a particular type of rational approximation whose
power series expansion agrees with a given power series to the highest possible order. The $L,M$ Padé approximant is denoted by
$$[L/M] = \frac{P_L(x)}{Q_M(x)}$$
where $P_L(x)$ is a polynomial of degree less than or equal to $L$, and $Q_M(x)$ is a polynomial of degree less than or equal to $M$. The formal power series
$$f(x) = \sum _{j=0}^\infty f_jx^j$$
determines the coefficients by the equation
$$f(x) -\frac{P_L(x)}{Q_M(x)}=O(x^{L+M+1})$$
It is well known that Padé approximations can outperform a truncated Taylor expansion when functions contain poles, because the use of rational functions allows them to be well-represented. However, the following open questions remain
The $[L/L+J]$ Padé approximants to any meromorphic function converge in measure within any bounded region of the complex plane as $L$ approaches infinity. Can this result be extended to all entire functions? Are there entire functions which cannot be represented by a Padé approximant?
Baker has shown convergence properties for the series of Stieltjes and Polya. Can these results be extended to other infinite series? How far can one extend the convergence properties of Padé approximants?
How well can rational functions approximate the Riemann zeta function? Can we form a sequence of Padé approximants that converge uniformly to $\zeta$?
Why do Padé approximants work better than other approximants? What is the mathematical reason why Padé approximants generally produce better results than a truncated Taylor series?
There are a lot of open questions about Padé approximants. I am currently in my second year of graduate school (studying applied/computational mathematics) and might be able to investigate some of these.