Let me give you a simple example that captures the basic idea.
Suppose that we want to find the average of $N$ numbers. Let me call it $A(N)$. Now
$$
A(N) = \frac{x_1+x_2+\cdots X_N}{N}$$
Now imagine you have already calculated $A(N)$ and now receive a new data. What is the average of $N+1$ numbers?
$$
A(N+1) = \frac{x_1+x_2+\cdots X_N+X_{N+1}}{N+1}$$
The key is you do not have to calculate $A(N+1)$ from scratch. You can rewrite the above equation as
$$
(N+1) A(N+1) = x_1+x_2+\cdots X_N+X_{N+1} \\
= \left(x_1+x_2+\cdots X_N\right)+X_{N+1}=N\, A(N)+X_{N+1}$$
Rearranging and simplifying you get
$$
A(N+1)= A(N) + \frac{1}{N+1} \left(X_{N+1}-A(N)\right)$$
This is the recursive definition. It shows how to update the average with each new data value. We can write this as
$$
A_{\text{new}} = A_{\text{old}} + K \left(A_\text{old} - \text{data}\right)$$
There are 2 important parts to the equation above. $K$ is called the gain. $\left(A_\text{old} - \text{data}\right)$ is called the innovation and is the difference between what you expect and what you get. Note $K$ will depend on how many samples you have already processed.
Now for recursive linear equations (I will write $y = a x + b$)
you have the same structure
$$
\pmatrix{a_\text{new} \\ b_\text{new} }=\pmatrix{a_\text{old} \\ b_\text{old} } +
\pmatrix{K_{11} & K_{12}\\K_{21} & K_{22}} \left(y_\text{data} - (a_\text{old} x_\text{data} + b_\text{old})\right)$$
Added in response to comment
The formula for $K$ uses matrix inversion lemma which gives a recursive formula for $K$. The actual calculations are tedious and it will take me hours to type them here. Consult any good book. I wanted to give you the concepts. Actual details, as with any algorithm, is all algebra. Here is the procedure:
- Write the formula for $N$ data points and the formula for $N+1$ data points.
- In the formula for $N+1$ data points, replace all expressions involving the first $N$ data points by the formula for $N$ data points
- Re-arrange and simplify
- You will end up with an expression of the form $H^{-1}-(H+v v^T)^{-1}$ where $v$ is a vector.
- Use matrix inversion lemma to get $H^{-1}-(H+v v^T)^{-1}=H^{-1}vv^TH^{-1}/(1+v^T H^{-1} v)$ (Actually it turns out that it is easier to write the recurrence relationship of $H^{-1}$).
- Matrix gain $K$ can then be written in terms of $H$.
As with all such algorithms...it is details, details, details. Consult any good book.
RLS is a special case of BLUE (best linear unbiased estimate) which itself is a special case of Kalman filters. I chose to write the gains as $K$ in honor of Kalman who gave the recursive formula in a much broader context.
Good luck.