I am having some problems trying to apply Dvoretzky Stochastic Approximation Theorem to one Lemma used in a paper I found about the proof of convergence of some reinforcement learning temporal difference methods.
Jaakkola & Jordan & Singh claim that the following lemma is a standard result and that it follows from Dvoretzky theorem:
Lemma 1
A random process
$\omega_{n+1}(x) = (1-\alpha_n(x))\omega_n(x)+\beta_n(x)r_n(x)$
converges to zero with probability one if the following conditions are satisfied:
$\sum_n\alpha(x) = \infty, \: \sum_n \alpha^2_n(x) < \infty, \: \sum_n\beta_n(x) = \infty, \: \sum_n \beta_n^2(x) < \infty,$ and $\mathbb{E}[\beta_n(x) \mid P_n] \leq \mathbb{E}[\alpha_n(x) \mid P_n]$ uniformly with probability 1.
$\mathbb{E}[r_n(x) \mid P_n] = 0 $ and $\mathbb{E}[r_n^2(x) \mid P_n] \leq C $ w.p.1, where $ P_n = \{\omega_n, \omega_{n-1},...,r_{n-1},r_{n-2},...,\alpha_{n-1},\alpha_{n-2},...,\beta_{n-1},\beta_{n-2},...\}.$ All the random variables are allowed to depend on the past $P_n$
The proof in the paper is only the following:
Except for the appearance of $\beta_n(x)$ this is a standard result. With the above definitions convergence follows directly from Dvoretzky's extended theorem (Dvoretzky, 1956).
This lemma appears in page 11 in the paper by Jaakkola et al. (1993), in the proof of Theorem 1 (link at the end of the post).
The Dvoretzky theorem is from a paper of 1956 by Aryeh Dvoretzky (On Stochastic Approximation). I will write it here:
Dvoretzky Theorem Let $(\Omega, F, \mu)$ be a probability space and $\alpha_n$, $\beta_n$ and $\gamma_n$, $n = 1, 2, ... $ non-negative real numbers satisfying: \begin{equation} \lim_{n\to\infty}a_n = 0, \end{equation} \begin{equation} \sum_{n=1}^{\infty}\beta_n < \infty, \end{equation} \begin{equation}\label{condDvoretzky:2.3} \sum_{n=1}^{\infty}\gamma_n = \infty. \end{equation}
Let $\theta$ be a real number and $T_n: \mathbb{R}^n \rightarrow \mathbb{R}$, $n = 1, 2, ... $, measurable functions satisfying \begin{equation} |T_n(r_1,...,r_n) - \theta| \leq \max(\alpha_n, (1+\beta_n) |r_n - \theta| - \gamma_n) \end{equation} for all real numbers $r_1,...,r_n$. Let $X_1$ and $Y_n$, $n = 1, 2, ... $, be random variables. We define \begin{equation} X_{n+1}(\omega) = T_n[X_1(\omega),...,X_n(\omega)] + Y_n(\omega), \quad n \geq 1. \end{equation}
Then, the conditions \begin{equation} \mathbb{E}[X_1^2] < \infty, \end{equation} \begin{equation}\label{condDvoretzky:2.6} \sum_{n=1}^{\infty}\mathbb{E}[Y_n^2] < \infty, \end{equation} and \begin{equation}\label{eq:condicionEsperanzaYDvoretzky} \mathbb{E}[Y_n \mid X_1, ..., X_n] = 0 \end{equation} w.p.1 for all $n$, imply \begin{equation} \label{condDvoretzky:2.8} \lim_{n\to\infty}\mathbb{E}[(X_n-\theta)^2] = 0 \end{equation} and \begin{equation} \label{condDvoretzky:2.9} \mathbb{P}(\lim_{n \to \infty}X_n = \theta) = 1. \end{equation}
And now an extension to the case where the coefficients are non-negative functions.
The theorem remains valid if $\alpha_n$, $\beta_n$ y $\gamma_n$ are replaces by non-negative functions $\alpha_n(r_1,...,r_n)$, $\beta_n(r_1,...,r_n)$ y $\gamma_n(r_1,...,r_n)$, respectively, provided they satisfy the conditions:
The functions $\alpha_n(r_1,...,r_n)$ are uniformly bounded and \begin{equation} \label{condDvoretzky:2.10} \lim_{n\to\infty}\alpha_n(r_1,...,r_n) = 0 \end{equation} uniformly for all sequences $r_1,...,r_n$.
The functions $\beta_n(r_1,...,r_n)$ are measurable and \begin{equation} \label{condDvoretzky:2.11} \sum_{n=1}^{\infty}\beta_n(r_1,...,r_n) \end{equation} is uniformly bounded and uniformly convergent for all sequences $r_1,...,r_n$.
The functions $\gamma_n(r_1,...,r_n)$ satisfy \begin{equation} \label{condDvoretzky:2.12} \sum_{n=1}^{\infty}\gamma_n(r_1,...,r_n) = \infty \end{equation} uniformly for all sequences $r_1,...,r_n$, for which \begin{equation} \sup_{n \geq 1} |r_n| < L, \end{equation} $L < \infty$ being an arbitrary number.
This theorem can be found in the paper from Dvoretzky (1956) (link at the end of the post).
However, I can't find a way to relate that lemma to the Dvoretzky theorem, and nor can I find papers that prove the result, as stated in the first paper.
I don't know how to identify the coefficients $\alpha_n, \beta_n$ and $\gamma_n$ in the Dvoretzky theorem with the ones that appear in the first lemma. Also I don't know how could the function $T_n$ be written and what would the $Y_n$ be in the lemma.
Is it actually easy to prove that from the theorem? I would appreciate any help in this matter, and it would be perfect if you could provide an article or book where they prove that, or something similar.
References
Tommi Jaakkola, Michael Jordan, Satinder Singh, On the convergence of stochastic iterative dynamic programming algorithms, AI Memo 1441. August 6, 1993.
Aryeh Dvoretzky, On stochastic approximation, Berkeley Symposium on Mathematical Statistics and Probability, 1956. (Another link in case the previous goes down) On stochastic approximation