-1

Let $\beta^* \in\mathbb R^p \sim c\exp(-f(\beta))$ be a random variable and $\beta (\lambda) \in\mathbb R^p \sim c_1\exp(-f(\beta) - \lambda \|\beta\|_2^2/2)$ be a random variable for some appropriate constant $c$ and $c_1$, and both the random variables are independent of each other. Also, let $E[\|\beta^*\|^2_2] = \alpha$ and $E[\beta^*] = \delta$. Let $f$ be $m$-strongly convex and $L$-lipschitz smooth. I am trying to find a strong uppr bound on expectation of $\|\beta^* - \beta(\lambda) \|^2 $ in terms of $\lambda, \delta, \alpha, L, m$ and $p$- \begin{align*} E \left[ \|\beta^* - \beta(\lambda) \|^2 \right] \end{align*}

newbie
  • 81

1 Answers1

2

To begin, we need some connection between $f$ and $(\alpha,\delta)$. Since $f$ is $m$-strongly convex, it has some minimum, achieved at, say $\beta_0$. Changing $c_0$ as necessary, I assume $f(\beta_0)=0$. Moreover, as $\beta$ moves away from $\beta_0$, $f$ grows at least quadratically, and so the corresponding pdf decays at least as fast as the normal. (That's already very fast; see Putanumonit's discussion, but take his soccer conclusions with a grain of salt.)

Thus we estimate \begin{align*} \mathbb{E}[\|\beta^*-\beta_0\|_2^2]&=\int_{\mathbb{R}^p}{\|\beta-\beta_0\|_2^2\cdot c_0e^{-f(\beta)}\,d^p\beta} \\ &\leq\int_{\mathbb{R}^p}{c_0\|\beta-\beta_0\|_2^2e^{-\frac{m}{2}\|\beta-\beta_0\|_2^2}\,d^p\beta} \\ &=c_0\cdot \text{vol}(S^{p-1})\int_0^{\infty}{r^2e^{-\frac{m}{2}r^2}\cdot r^{p-1}\,dr} \\ &=c_0\cdot\frac{2\pi^{\frac{p}{2}}}{\Gamma\left(\frac{p}{2}\right)}\cdot\frac{1}{2}\left(\frac{2}{m}\right)^{\frac{p}{2}+1}\Gamma\left(\frac{p}{2}+1\right) \\ &=\frac{c_0p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1} \end{align*} where $\Gamma$ is the Gamma function, respectively. This is small: $\delta$ and $\beta_0$ roughly coincide.

(Essentially, we just performed Laplace's method.)

We want to do the same with $\beta(\lambda)$, since then the difference is $$\beta(\lambda)-\beta^*=\beta(\lambda)-\beta_0+\beta_0-\beta^*$$ But the mean is not quite so straightforward: if we approximate $\beta(\lambda)$ as a normal, then it is centered around $\left(\frac{m}{m+2\lambda}\right)\beta_0$: $$\frac{m}{2}\|\beta-\beta_0\|_2^2+\lambda\|\beta\|_2^2=\left(\frac{m}{2}+\lambda\right)\left\|\beta-\left(\frac{m}{m+2\lambda}\right)\beta_0\right\|_2^2+\frac{m}{2}\left(1-\frac{1}{m+2\lambda}\right)\|\beta_0\|^2$$ just from completing the square. As before, I will absorb the constant term into $c_1$, so that our bound is $$\text{pdf}_{\beta(\lambda)}(\beta)\leq c_1e^{-\left(\frac{m}{2}+\lambda\right)\left\|\beta-\frac{m}{m+2\lambda}\beta_0\right\|_2^2}$$

Once I have that estimate, though, the exact same argument goes through: $$\mathbb{E}\left[\left\|\beta-\frac{m}{m+2\lambda}\beta_0\right\|_2^2\right]\leq\frac{c_1p}{2\pi}\left(\frac{2\pi}{m+2\lambda}\right)^{\frac{p}{2}+1}$$

Now, by the identity $$\|a+b+c\|^2\leq3(\|a\|^2+\|b\|^2+\|c\|^2)$$ (true in any inner product space), we have \begin{align*} \mathbb{E}[\|\beta^*-\beta(\lambda)\|_2^2]&=\mathbb{E}\left[\left\|\left(\beta^*-\beta_0\right)+\frac{2\lambda}{m+2\lambda}\beta_0+\left(\left(\frac{m}{m+2\lambda}\right)\beta_0-\beta(\lambda)\right)\right\|_2^2\right] \\ &\leq3\left(\mathbb{E}[\|\beta^*-\beta_0\|^2]+\frac{2\lambda}{m+2\lambda}\|\beta_0\|_2^2+\mathbb{E}\left[\left\|\left(\frac{m}{m+2\lambda}\right)\beta_0-\beta(\lambda)\right\|_2^2\right]\right) \\ &\leq3\left(\frac{c_0p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1}+\frac{2\lambda}{m+2\lambda}\|\beta_0\|_2^2+\frac{c_1p}{2\pi}\left(\frac{2\pi}{m+2\lambda}\right)^{\frac{p}{2}+1}\right) \end{align*} If you want to sharpen this, you can do a little better with calculating out the cross terms. But I don't think they'll be leading-order.

In any case, we're almost done. The only term we haven't computed in terms of our original parameters is $\|\beta_0\|^2$. Well, \begin{align*} \|\beta_0\|_2^2-\|\beta^*\|_2^2&=\left|\|\beta^*+(\beta_0-\beta^*)\|_2^2-\|\beta^*\|_2^2\right| \\ &=\left|\|\beta^*-\beta_0\|_2^2+2\langle\beta^*,\beta_0-\beta^*\rangle\right| \\ &\leq\|\beta^*-\beta_0\|_2^2+2\|\beta^*\|_2\|\beta_0-\beta^*\|_2 \end{align*} where the last line is by Cauchy-Schwarz. Taking expectations, we have $$\|\beta_0\|_2^2-\alpha\leq\frac{c_0p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1}+2\mathbb{E}[\|\beta^*\|_2]\mathbb{E}[\|\beta_0-\beta^*\|_2]$$ Reversing the order of subtraction on the left and repeating the same proof, we can introduce an absolute value: $$\left|\|\beta_0\|_2^2-\alpha\right|\leq\frac{c_0p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1}+2\mathbb{E}[\|\beta^*\|_2]\mathbb{E}[\|\beta_0-\beta^*\|_2]$$ Thus it suffices to show that both terms on the right are small. By Jensen's inequality, for any random variable $R$, we have $\mathbb{E}[R]^2\leq\mathbb{E}[R^2]$; equivalently, $\mathbb{E}[R]\leq\sqrt{\mathbb{E}[R^2]}$. Thus $$\left|\|\beta_0\|_2^2-\alpha\right|\leq\frac{c_0p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1}+2\sqrt{\alpha}\sqrt{\frac{c_0p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1}}$$

Putting it all together, \begin{align*} \mathbb{E}[\|\beta^*-\beta(\lambda)\|_2^2]&\leq\frac{3p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1}\left(c_0+c_1\left(\frac{m}{m+2\lambda}\right)^{\frac{p}{2}+1}\right)+{} \\ &\qquad\frac{6\lambda}{m+2\lambda}\left(\sqrt{\alpha}+\sqrt{\frac{c_0p}{2\pi}\left(\frac{2\pi}{m}\right)^{\frac{p}{2}+1}}\right)^2 \end{align*}

Jacob Manaker
  • 10,173