Prove Convergence of Iterative Least Square Algorithm in space of Basis Functions

Question

I have the following problem where I have already established convergence in the general case. I would like to show the convergence in a lower-dimensional space spanned by basis functions.

General problem

Consider the following recursion and establish that the algorithm will converge to a unique solution $ v^* \in \mathbb{R}^{n}$ from any initial guess $v_0$ and small enough time step $\Delta t$ as $ t \to + \infty$:

$$ v_{t+\Delta t} \left ( \mathbf{x} \right ) = G(v_{t} \left ( \mathbf{x} \right )), \quad \mathbf{x} \in \Omega \subset \mathbb{R}^{n \times k} $$

where $\Omega$ is a compact set, $v_{t} \left ( \mathbf{x} \right ) = \left [ v \left ( x^{(1)} \right ) , \cdots , v \left ( x^{(n)} \right ) \right ]' $ is the $n$-dimensional vector of images of each matrix lines $x^{(i)} \in \mathbb{R}^{k}$ by $v_{t}$ and G is a continuously-differentiable non-linear map $\mathbb{R}^n \mapsto \mathbb{R}^n$.

For more context, the $x^{(i)}$'s are the $n$ points used for the discrization of a $k$-dimensional state-space and, $v$ is the value function that I need to solve, and $G$ results from the discretization of some PDE $ F \left ( \mathbf{x} , \mathbf{v} , \triangledown \mathbf{v} , \triangledown^2 \mathbf{v} \right )$ using a FD scheme. I want to solve for the solution to an HJB equation using a time-marching algorithm, which is a fixed-point of the above equation. You can abstract from the $\mathbf{x}$'s for now (I come back to them later):

$$ \mathbf{\mathbf{v}}_{t+\Delta t} = G \left ( \mathbf{v}_{t} \right ), \quad \mathbf{v}_t \in \mathbb{R}^{n} $$

Proof: In my setup, I can show that the Jacobian is of the following form

$$ \triangledown_v G(\mathbf{v}) = \left [ (1+\rho \Delta t )\mathbb{I}_n - A \Delta t \right ]^{-1} $$

where $\rho>0$, $A$ is an $n \times n$ matrix, and it is zero column-sum and has only non-negative off-diagonal elements, i.e.

$$ \sum_{j=1}^{N} a_{ij} = 0 , \forall i \quad \text{ and } \quad \left| a_{ij} \right| = a_{ij} , \forall i \neq j $$

Let $B = (1+\rho \Delta t )\mathbb{I}_n - A \Delta t $. By Gershgorin circle theorem, one can show that all eigenvalues $\lambda_i$ of $B$ are bounded below by $1+\rho \Delta t$ as they are contained in the circles of center $1+\rho \Delta t - \Delta t a_{ii}$ and of radius $R_i = \Delta t |a_{ii} | $. Therefore, $ \mathbb{I}_n - A \frac{\Delta t}{1+\rho \Delta t} $ is invertible and

$$ \left\| \triangledown_v G(\mathbf{v}) \right\|_2 \simeq \left\| \left ( (1+\rho \Delta t )\mathbb{I}_n - A \Delta t \right )^{-1} \right\| _2 + o \left ( \Delta t^2 \right ) < 1 + o \left ( \Delta t^2 \right ) $$

Hence the algorithm converges quadratically to the solution $\mathbf{v}^*$ provided that $\Delta t$ is small enough.

Modified problem

Typically in the above problem, $n$ is very large because it corresponds to the grid points used for the discretization of $\Omega_{dense}$.

I now want to reduce the dimensionality of this problem by approximating the values $\left \{ v_{i} \right \}_1^n$ using $M$ basis functions $\left\{ \phi_1(.), \cdots , \phi_m(.) \right\}$ mapping $\Omega_{dense}$ into $\mathbb{R}$ (I chose local polynomials in my case). Typically, $M << n$. Let $\tilde{V}$ be the space spanned by my basis functions.

$$ \tilde{V} = \left\{ \tilde{v} \in \mathbb{R} : \tilde{v} = \sum_{m=1}^{M} \phi_m\left ( \mathbf{x} \right ) w_m = \Phi \left ( \mathbf{x} \right ) \cdot \mathbf{w}, \mathbf{w} \in \mathbb{R}^m \text{ and } \mathbf{x} \in \Omega_{dense} \right\} $$

Now, the problem rewrites

$$ \widetilde{\mathbf{v}}_{t+\Delta t} = G(\tilde{\mathbf{v}}_t), \quad \widetilde{\mathbf{v}}_t \in \widetilde{V} $$

where G is the same non-linear map. This problem can be reduced to finding a fixed-point $\mathbf{w}^*$ of the $\mathbf{w}_t$ coefficients in $\mathbb{R}^M$:

$$ \mathbf{w}_{t+\Delta t} = \tilde{G}(\mathbf{w}_t), \quad \mathbf{w}_t \in \mathbb{R}^m $$

I want to use the same dataset of $\left \{ v(x_i), 1 \leq i \leq n \right \}$ therefore the above problem becomes overdetermined. I again can show that the Jacobian has a convenient expression:

$$ \triangledown_w \tilde{G} \left ( \mathbf{w} \right ) = \tilde{B} ^{\dagger} \Phi $$

where $\tilde{B} = B \Phi = \left [ (1+\rho \Delta t )\mathbb{I}_n - A \Delta t \right ] \Phi$, $\tilde{B} ^{\dagger} = \left [ \tilde{B}' \tilde{B} \right ]^{-1}\tilde{B}'$ is the pseudo-inverse $\Phi = \left [ \phi_m \left ( x_n \right ) \right ]_{n,m} \in \mathbb{R}^{n \times M}$ is rectangular. I effectively run a least-square regression at every iteration and project the $n$ values of $v$ onto the space of real coefficients with dimension $M$.

Question

I would like to establish that this modified system still converges to a unique solution $w^*$ but despite different attempts, I get stuck on the fact that $\Phi$ is rectangular and complicates the problem.

Would you have any ideas on how to prove this?

My attempts led me to noticing the following

$A$ still is zero column-sum $B$ still has all its eigenvalues bounded below.
$\tilde{B} = B \Phi$ is full column-rank $M$
$\tilde{B}$ admits a singular value decomposition that should have singular values closely related to that of B (my guess). See here
If the basis $\phi^m (.)$ are orthonormal, one should have that $\phi_m (x) \phi_{m'} (x) = 0 , \forall m \neq m', \forall x$ and $\sum_{m=1}^{M} \phi_m(x) = 1 , \forall x$ (I also didn't prove this). For now, my basis functions are not even orthogonal but I wondered if one could use a change of basis argument.
$A'A$ is both zero column- and row-sum so its eigenvalues are all $0$

Note that $\left [ \left ( \left ( 1+\rho \Delta t \right ) \mathbb{I}_n - A \right ) \Phi \right ]$ has the same shape as $\Phi$. That being the case, what exactly do you mean when you write $\left [ \left ( \left ( 1+\rho \Delta t \right ) \mathbb{I}_n - A \right ) \Phi \right ]^{-1}$? Also, what exactly is $b$ in this context? The lowercase vector implies that it would be a vector, but you say that $\left [ (1+\rho \Delta t )\mathbb{I}_n - A \Delta t \right ]^{-1} b$ is an expression for the Jacobian, which seems to imply that $b$ is a square matrix. — Ben Grossmann, Aug 21 '23 at 17:35
Also, should there be a $\Delta t$ multiplying the $A$ in the expression $[((1 + \rho \Delta t) \mathbb I_n - A)\Phi]$? — Ben Grossmann, Aug 21 '23 at 17:40
My hunch is that, once you iron out some errors, you'll end up with an expression of the form $\left [ \Phi^\top \left ( \left ( 1+\rho \Delta t \right ) \mathbb{I}_n - \Delta t A \right ) \Phi \right ]^{-1}$. If this is the case, then your proof should be fairly easy to generalize. — Ben Grossmann, Aug 21 '23 at 17:44
Quick correction: in that first comment I meant to say the lowercase letter implies that it would be a vector — Ben Grossmann, Aug 21 '23 at 17:45
Hi Ben! Thanks for spotting the errors, I fixed it and just dropped the $b$ which wasn't relevant to the problem actually (norm less than $1$). — François, Aug 21 '23 at 18:15
Yes, the expression for the Jacobian becomes $\left [ \Phi' \left ( (1+\rho \Delta t) - \Delta t A \right )' \left ( (1+\rho \Delta t) - \Delta t A \right ) \Phi \right ]^{-1} \Phi' \left ( (1+\rho \Delta t) - \Delta t A \right )' $ $ = \left [(1+\rho \Delta t)^2 \Phi' \Phi - 2(1+\rho \Delta t)(A \Phi+ \Phi' A') + \Phi' A'A\Phi \right ]^{-1} \Phi' \left ( (1+\rho \Delta t) - \Delta t A \right )' $ but given that $\Phi$ is rectangular, I don't know what properties to involve to generalize the proof. Any ideas? — François, Aug 21 '23 at 18:21
The inverse definitely makes sense now, but I'm not sure how $\tilde G$, which goes from $\Bbb R^m$ to $\Bbb R^m$, could have a rectangular Jacobian — Ben Grossmann, Aug 21 '23 at 19:31
My bad again (and sorry), I tried to simplify the exposition but I missed a $\Phi$ matrix that makes the Jacobian of $\tilde{G}$ square. Should be OK now. I think there may be a way to prove that the eigenvalues of $\tilde{B}' \tilde{B}$ are bounded below by $(1+\rho \Delta t)^2$ while those of $\tilde{B}' \Phi$ are bounded below by $(1+\rho \Delta t)$. However, it isn't enough for norm of whole expression to be less than 1, and I probably need to use the singular values of $\tilde{B}$ directly — François, Aug 21 '23 at 19:45
Your map $\tilde G$ seems to be of the form $\tilde G(\mathbf w) = \Phi^\dagger G(\Phi \mathbf w)$, is that correct? — Ben Grossmann, Aug 21 '23 at 19:45
Also, in your original Gershgorin circle based argument, shouldn't $a_{ii}$ be multiplied by $\Delta t$? — Ben Grossmann, Aug 21 '23 at 19:58
$\tilde{G} (w) = \left [ \left ( (1+\rho \Delta t ) + A \Delta t \right ) \Phi \right ]^{\dagger} \left ( f (\Phi w_t ) + \Phi w_t \right ) $ by the $f$ function drops by some FOC argument that is related to the problem and unexposed here. So to your question, yes if $\left [ \left ( (1+\rho \Delta t ) + A \Delta t \right ) \Phi \right ]^{\dagger} = \Phi^{\dagger} \left [ \left ( (1+\rho \Delta t ) + A \Delta t \right ) \right ]^{-1} $ but I havent proven it (it may be trivial) — François, Aug 21 '23 at 19:59
And yes for the center and radius of the Gershgorin disks although it won't change the lower bound on the eigenvalues — François, Aug 21 '23 at 20:02
Unfortunately, $(AB)^\dagger = B^\dagger A^{-1}$ does not hold in general. I believe that $B$ would have to have full row-rank in order for that to work, which won't happen here because $\Phi$ is a tall matrix — Ben Grossmann, Aug 21 '23 at 20:11
Am I correct in saying that your Jacobian should have the form $$ \nabla_w \tilde G = [((1 + \rho\Delta t)\Bbb I + A \delta t)\Phi]^\dagger(\Bbb I + \nabla_vf)\Phi? $$ — Ben Grossmann, Aug 21 '23 at 20:17
Yes that is correct. But in the general problem, once the invertibility of $B$ is established, you can take an expansion of $B^{-1}$, and the terms for $f$ and $ \triangledown_v f $ cancel using an FOC condition that is related to the underlying optimization problem. I prefer to stay away from this here as the proof really relies on the $B^{-1}$ having a norm strictly less than 1. — François, Aug 21 '23 at 20:29
Never mind, I'm going in an unproductive direction. I think the form for the Jacobian that you give in terms of $B$ is fine, I'm just thrown off by the fact that the Jacobian is being used directly in the formula for $\tilde G$ — Ben Grossmann, Aug 21 '23 at 20:30
Is there a reason that your formula for $\tilde G(\mathbf w)$ has the inverse of the Jacobian of $G$ inside of it? This doesn't seem to correspond to your brief description from your question, i.e. where you apply "least-square regression at every iteration and project the $n$ values of $v$ onto the space of real coefficients with dimension $M$". I would think that your dimensionality reduction scheme should turn into the original problem in the case that $M = n$, but that doesn't seem to be the case for your approach — Ben Grossmann, Aug 21 '23 at 20:40
The full problem is defined for the $n$ values of $v$ and for which I have guarantee of convergence of the algorithm with $n$ points and to the true solution as $n \rightarrow \infty$. The approximated problem starts from the same full problem but condenses it over $M$ points. I'd like to establish convergence of the $w$ when the system is overdetermined to minimize the error in the approximation scheme F^h everywhere in the state space (more details here) — François, Aug 21 '23 at 21:04
So they are identical if $n=M$ and $\phi_m(.)$ are some indicator functions for each grid point $m$. — François, Aug 21 '23 at 21:04

Prove Convergence of Iterative Least Square Algorithm in space of Basis Functions

0 Answers0

Linked