From the Wiki article: "Usually, it is much cheaper to calculate Ax (involving only calculation of differences, one multiplication and one division) than to calculate many more terms of the sequence x. Care must be taken, however, to avoid introducing errors due to insufficient precision when calculating the differences in the numerator and denominator of the expression."
Ideally, all you really need to do is compute the first few terms $x_n$. Upon performing the transformation, for the right sort of sequence, the few $y_n$ coupled with the few $x_n$ will be VERY close to the limit, which may only have been achieved by computing many $x_n$.
The transformation works as follows: if $x_n = L+\epsilon_1^n + \epsilon_2^n$, where $L$ is the limit, then one may find that $y_n = L+\epsilon_1^n$. That is, the transformation eliminates a source of the error.
Also see Bender & Orszag, Ch. 8.1 on the Shanks transformation for an illustration of how the error reduction works. I'll show it for one term.
(By the way, I don't think the transformation is correct as you wrote it.)
Let $x_n = L + a \epsilon^n$. Then
$$y_n = x_n - \frac{(x_{n+1}-x_n)^2}{x_{n+2}-2 x_{n+1}+x_n} = L+a \epsilon^n - \frac{a (\epsilon^{n+1}-\epsilon^n)^2}{\epsilon^{n+2}-2 \epsilon^{n+1}+\epsilon^n} = L+a \epsilon^n - \frac{a \epsilon^n (1-\epsilon)^2}{(1-\epsilon)^2} = L$$