0

This question addresses an argument in the convergence proof of an SQP method I found in a text book.

Let $f \colon \mathbb R^n \to \mathbb R$ be twice continuously differentiable. Let there be given sequences of directions $s_k \to \hat s$ in $\mathbb R^n$ and of step lengths $\sigma_k \to 0$ in $\mathbb R$. Also, let $x_k \to \hat x$ in $\mathbb{R}^n$ with $x_{k+1} = x_k + \sigma_k s_k$.

I wish to prove that

$$\begin{equation}\tag{1}\label{eq:lemma} \lim_{k\to\infty} \frac{1}{\sigma_k}\big( f(x_k + \sigma_k s_k) - f(x_k) \big) - \nabla f(x_k)^\top s_k = 0. \end{equation}$$


By definition of $\nabla f(x_k)^\top s_k$ and by continuity of $\nabla f,$ we have

$$\begin{align}\tag{2}\label{eq:limits} \lim_{x_k \to \hat x} \lim_{s_k \to \hat s} \lim_{\sigma_k \to 0} \frac{f(x_k + \sigma_k s_k) - f(x_k)}{\sigma_k} = \lim_{x_k \to \hat x} \lim_{s_k \to \hat s} \nabla f(x_k)^\top s_k = \nabla f(\hat x)^\top \hat s. \end{align}$$

I have a couple of questions:

  1. In order to use \eqref{eq:limits} for \eqref{eq:lemma}, I need to argue why it is valid to compute the limits of $\sigma_k,s_k,x_k$ one after another, even though in \eqref{eq:limits} they are approached in parallel. Can you help?

  2. Does \eqref{eq:lemma} hold true for any converging sequence $x_k \to \hat x$?

  3. Does \eqref{eq:lemma} hold true for any sequence $x_k$, not necessarily converging?

Lumen
  • 145
  • 1
    I believe for the first question, you are dealing with a uniform differentability condition (a multivariable version of what is found in the top answer here https://math.stackexchange.com/questions/240296/what-is-the-precise-definition-of-uniformly-differentiable). I believe you will have a uniform approach of $f(x+\sigma s)$ to $\nabla f^\top s$ for all $x, \sigma, s$ uniformly close to $\hat{x}$, $0$, and $\hat{s}$, respectively.

    If that approach works, I don't see why it wouldn't also give a positive answer for question 2.

    –  Sep 14 '18 at 10:41
  • 1
    For the third question, the answer is no. The reason is because the argument I mentioned above has uniformity over bounded sets (and we are controlling $x_k$ by being near $x$). If we lose this, we lose the uniformity. For example, $f(t)=t^3$, $x_k=k$, $\sigma_k=1/k$, $s_k=s=1$. Then $$\frac{f(x_k+\sigma_k s_k)-f(x_k)}{\sigma_k} = 3k^2+3,$$ and $$f'(x_k)=3k^2.$$ Thus the difference is always $3$. –  Sep 14 '18 at 10:47
  • @bangs Many thanks for your comments. I have to think about the argument of uniform differentiability for a moment. I actually thought about the same counterexample that you provided, but failed to properly compute the value of the difference quotient, LOL. – Lumen Sep 14 '18 at 11:15

1 Answers1

1

By a mean value theorem, there is a number $\alpha_k \in\;]0,1[$ such that $$ f(x_k + \sigma_k s_k) - f(x_k) = \sigma_k \nabla f(x_k + \alpha_k\sigma_k s_k)^\top s_k. $$ Subtracting $\sigma_k \nabla f(x_k)^\top s_k$ on both sides leads to $$ f(x_k + \sigma_k s_k) - f(x_k) - \sigma_k \nabla f(x_k)^\top s_k = \sigma_k\big(\nabla f(x_k + \alpha_k\sigma_k s_k) - \nabla f(x_k)\big)^\top s_k, $$ which is equivalent to $$ \frac{1}{\sigma_k}\big(f(x_k + \sigma_k s_k) - f(x_k)\big) - \nabla f(x_k)^\top s_k = \big(\nabla f(x_k + \alpha_k\sigma_k s_k) - f(x_k)\big)^\top s_k. $$ Using the mean value theorem again, there is some $\beta_k \in \;]0,1[$ such that $$ \big(\nabla f(x_k + \alpha_k\sigma_k s_k) - f(x_k)\big)^\top s_k = \alpha_k\sigma_k s_k^\top \nabla^2 f(x_k + \alpha_k\beta_k\sigma_k s_k)s_k, $$ and hence, $$\begin{align} \left\lVert\big(\nabla f(x_k + \alpha_k\sigma_k s_k) - f(x_k)\big)^\top s_k\right\rVert &\leq \alpha_k\sigma_k \left\lVert\nabla^2 f(x_k + \alpha_k\beta_k\sigma_k s_k)\right\rVert \cdot\lVert s_k\rVert^2 \\ &\leq\sigma_k\left\lVert\nabla^2 f(x_k + \alpha_k\beta_k\sigma_k s_k)\right\rVert \cdot\lVert s_k\rVert^2 \to 0, \end{align}$$ since the spectrum of the Hessian matrix stays bounded.

Lumen
  • 145