12

In the paper On the Convergence of Stochastic Iterative Dynamic Programming Algorithms (Jaakkola et al. 1994) the authors claim that a statement is "easy to show". Now I am starting to suspect that it isn't easy and they might have just wanted to avoid having to show it. But I thought I would post this to see if I missed something obvious.

enter image description here

What I have so far:

  1. Since $X_n$ is bounded in a compact interval it certainly has convergent subsequences
  2. $|X_{n+1}-X_n|\le (\alpha_n+\gamma \beta_n)C_1$ which implies that it converges to zero, but that isn't enough for it to be a Cauchy sequence even with the first statement
  3. $\lim\inf X_n\ge 0$ $$ \begin{align} X_{n+1}(x)&=(1-\alpha_n(x))X_n(x) + \gamma\beta_n(x)\|X_n\| \\ &\ge (1-\alpha_n(x))X_n(x)\\ &=\prod^n_{k=0}(1-\alpha_k(x))X_0 \to 0 \end{align}$$ since $\sum \alpha_n=\infty$ (c.f. Infinite Product).
  4. Because of $\alpha_n(x) \to 0$ we know that $(1-\alpha_n(x))\ge 0$ for almost all $n$, and if $X_n(x)\ge 0$ then $$ X_{n+1}(x) = \underbrace{(1-\alpha_n(x))}_{\ge 0} \underbrace{X_n(x)}_{\ge 0} +\underbrace{\gamma\beta_n(x)\|X_n\|}_{\ge 0} $$ thus if one element of the sequence is positive all following elements will be positive too. The sequences which stay negative converge to zero ($\lim\inf X_n\ge 0$). The other sequences will be positive for almost all n.
  5. For $\|X_n\|$ not to converge $\|X_n\|=\max_x X_n(x)$ for an infinite amount of n. If it was equal to the maximum of the negative sequences for almost all n it would converge. $$\|X_n\|=\max_x -X_n(x) \le \max_x - \prod_{k=0}^n (1-\alpha_k) X_0 \to 0 $$
  6. If we set $\beta_n=0$ we have $$X_m=\prod_{k=n}^{m-1} (1-\alpha_k)X_n \to 0$$ So my intuition is: since $\beta_n$ is smaller than $\alpha_n$ (on average) replacing $\beta_n$ with $\alpha_n$ should probably be fine, since you introduce a larger difference to zero. So I think going in the direction $$X_{n+1}\sim (1-\alpha_n)X_n +\gamma \alpha_n X_n = (1-(1-\gamma)\alpha_n)X_n$$ Which is fine since $\sum(1-\gamma)\alpha_n =\infty$ for $\gamma\in(0,1)$

But I still need to formalize replacing $\beta_n$ with $\alpha_n$ which only works if I take the expected value. And I don't know if the expected value leaves the infinite sums intact. I also have to justify replacing the norm with just one element. I think I can assume that the norm is the max norm without disrupting later proofs. And since $\lim\inf X_n\ge 0$, $|X_n|$ is basically equal to $X_n$.

I am also a bit confused since the approach I am currently following would show that it converges to 0 instantly while the proof wants me to show that it converges to some $X^*$ and then continuous with arguments on how to show that it converges to $0$ from there. Which makes me think, that I am not on the "intended proof path". So maybe I am missing something obvious which could save me a lot of trouble. Especially since they claim it should be easy.

  • You say you haven't used $\sum_n \alpha_n^2 < \infty$. How did you deduce $\alpha_n \to 0$? Also, should the $\alpha$ in the first equation be an $\alpha_n$? Also, why can you scale the $X_n$'s to be uniformly bounded by some $C$? How do you know they can't blow up? – mathworker21 Mar 15 '19 at 17:05
  • @mathworker21 1. Good point (forgot) 2. Yes (I assume - wouldn't make sense otherwise) 3. Yeah, that is noted weirdly since they write $X_{n+1}=\dots$ when it should be $X_{n+1}= \dots $ if it is then smaller than $C_1$ otherwise scale it by multiplying with some $\lambda^k$ (where $\lambda<1$). They use that in a different lemma to bootstrap convergence: Since for $X_{n+1}=G(X_n)$ you can push the scaling through G. So scaling at some later point is equivalent to scaling the starting point. And they argue if it converges then you only scale for a finite number of times -> it converges anyway – Felix Benning Mar 15 '19 at 18:30
  • I'm still confused about 3. You don't know it converges and thus you don't know that scaling finitely many times is equivalent to scaling an arbitrary number of times. Consider the example $x_{n+1} = 2x_n$. Scaling by $1/3$ (say) an infinite number of times makes all the $x_n$'s equal to $0$, which makes $(x_n)_n$ converge. However, if $x_n = 2^n$, then clearly it does not converge. – mathworker21 Mar 15 '19 at 18:34
  • @mathworker21 Sorry for being unclear: so first you assume that you just scale as much as neccessary to keep it within the bound. Then you show it converges to 0 if you do that. Okay but if it converges to zero, then there exists some N after which you do not need to scale anymore as it stays smaller than C at that point. (Since you only ever scale as much as neccessary, the scaling would not be enough to keep it in an epsilon environment around 0). This means you only scale a finite amount. – Felix Benning Mar 15 '19 at 18:43
  • ... So in your case you are scaling too much if you start with $x_1=1$ you would scale with $1/3$ only every other time when the series passes a threshold let's say 3. And only as much as needed. So 1->2->4/3->8/3->16/9 – Felix Benning Mar 15 '19 at 18:46
  • 2
    What is the meaning of $|.|$? – user159517 Mar 15 '19 at 18:47
  • @FelixB. I think our disagreement is stemming from the definition of scaling. I thought you had to scale each $X_n$ so that $X_{n+1} = (1-\alpha_n)X_n+\gamma \beta_n ||X_n||$ still holds for each $n$. In my $x_{n+1} = 2x_n$ example, you scale each $x_n$ by $1/3$ every time you scale, so all the $x_n$'s would end up at 0 – mathworker21 Mar 15 '19 at 18:48
  • @user159517 the paper he linked to defined $||X_n|| = \max_x |X_n(x)|$. – mathworker21 Mar 15 '19 at 18:49
  • @user159517 it is a norm - while they haven't specified which one I think it is safe to assume it is the maximum norm – Felix Benning Mar 15 '19 at 18:49
  • @FelixB. If you scale your way (i.e. only a specific $X_n$), then you can't use the equation $X_{n+1} = (1-\alpha_n)X_n+\gamma \beta_n ||X_n||$ anymore – mathworker21 Mar 15 '19 at 18:50
  • @mathworker21 As I said they were a bit sloppy, but you can still use it as an inequality, and if you stay with a certain $\omega$ in the 1-set for which the $\alpha_n, \beta_n$ have their specified properties, then you can assume that you only ever scale $X_0$ and that the equation holds. – Felix Benning Mar 15 '19 at 18:52
  • How is $\gamma$ defined? Why do you think that $\beta_n$ is smaller than $\alpha_n$ on average? – user159517 Mar 15 '19 at 19:03
  • @user159517 It is not defined in the theorem by itself but it is a discount factor so assuming $\gamma\in(0,1)$ is appropriate. And regarding the second question: because the second assumption in the lemma states that the expected value is smaller. – Felix Benning Mar 15 '19 at 19:07
  • What is the set $S$? In particular, is $S$ (1) the domain of $X_n$, (2) a set which the theorem guarantees to exists, or (3) any arbitrary finite set? –  Mar 15 '19 at 19:16
  • @Strants it is the domain of $X_n$ – Felix Benning Mar 15 '19 at 19:17
  • What does "$E[\beta_n|P_n]$ uniformly" mean? – Michael Mar 21 '19 at 10:51
  • The sums coverge uniformly for all x, although reading dvoretzky's proof it probably needs to be uniformly over every possible history – Felix Benning Mar 21 '19 at 17:57

0 Answers0