1

I am currently studying the textbook Algorithms for Optimization by Mikel J. Kochenderfer and Tim A. Wheeler. Chapter 1.6.2 Multivariate says the following:

The following conditions are necessary for $\mathbf{x}$ to be at a local minimum of $f$:

  1. $\nabla f(\mathbf{x}) = 0$, the first-order necessary condition (FONC)

  2. $\nabla^2 f(\mathbf{x})$ is positive semidefinite (for a review of this definition, see appendix C.6), the second-order necessary condition (SONC)

The FONC and SONC are generalizations of the univariate case. The FONC tells us that the function is not changing at $\mathbf{x}$. Figure 1.8 shows examples of multivariate functions where the FONC is satisfied. The SONC tells us that $\mathbf{x}$ is in a bowl.

The FONC and SONC can be obtained from a simple analysis. In order for $\mathbf{x}^*$ to be at a local minimum, it must be smaller than those values around it:

$$f(\mathbf{x}^*) \le f(\mathbf{x} + h \mathbf{y}) \iff f(\mathbf{x} + h\mathbf{y}) - f(\mathbf{x}^*) \ge 0 \tag{1.14}$$

If we write the second-order approximation for $f(\mathbf{x}^*)$, we get:

$$ f(\mathbf{x}^* + h \mathbf{y}) = f(\mathbf{x}^*) + h \nabla f(\mathbf{x}^*)^T \mathbf{y} + \dfrac{1}{2} h^2 \mathbf{y}^T \nabla^2 f(\mathbf{x}^*)\mathbf{y} + O(h^3) \tag{1.15}$$

I'm wondering where the $O(h^3)$ term came from in 1.15? I cannot see why it would algebraically be there?

I would appreciate it if someone would please take the time to clarify this.

The Pointer
  • 4,686
  • 1
    https://math.stackexchange.com/questions/501735/why-do-we-use-big-oh-in-taylor-series – LinAlg Jan 28 '20 at 02:06
  • 1
    @LinAlg Oh, now I understand my confusion. I was thinking of big-$O$ notation in terms of algorithmic complexity, but it's used here to represent something different: It is used to signify that the third order (and higher) terms of the Taylor expansion are not relevant for describing the local behaviour of the function. For more information, see the answers by user96131 and Levi in the above question. – The Pointer Jan 31 '20 at 15:14
  • @LinAlg Do you want to post a small clarifying answer? That way, I can award someone the bounty, rather than let it go to waste. Perhaps just putting it in clearer terms than I have here, so as to ensure that there are no ambiguities. – The Pointer Jan 31 '20 at 15:18
  • 1
    sure, thanks, much appreciated! – LinAlg Feb 01 '20 at 02:57
  • @LinAlg Thank YOU! – The Pointer Feb 01 '20 at 04:12

3 Answers3

1

The term $O(h^3)$ means that the estimation error is locally bounded by a third degree polynomial. For example, the second order estimation of $f(x)=e^x$ at $x=2$ is $g(x) = e^2(0.5x^2 - x + 1)$, so $f(x) = g(x) + O(x^3)$. The higher the power of the error term, the more rapidly it goes to $0$ as $x\to 2$. Note that this example shows that the error term does not hold on the entirety of $\mathbb{R}$.

LinAlg
  • 20,093
0

You don't need the 3-rd order approximation. What you need is the mean-value theorem applied first to the original theorem and then to its derivative. If g is a function of one variable with g'(0)=0, then $$ g(1)-g(0)=g'(t), 0<t<1 $$ $$ =g'(t)-g'(0) $$ $$ =tg''(\tau), 0<\tau<t<1 $$. If $ \phi $ is a function of n variables, $ P_0 $ is a point in $ R^n $ , v is avector with n components and we define $ \omega$ by $$ \omega(t)=\phi(P_0+tv) $$ then $$ \omega'(t)=v\bullet\nabla\phi(P_0+tv) $$ Thus if f is a function of n variables and we define $$ g(t)=f(P_0+tv) $$, we have $$ g'(t)=v\bullet\nabla f(P_0+tv) $$ and $$ g''(t)=vHess(P_0+tv)v^T $$ So if we take a point $ P_1 $ and set $v=P_1-P_0 $ we have $$ f(P_1)-f(P_0) = g(1)-g(0) $$ $$=tvHess(P_0+\tau v)v^T , 0<\tau<t<1$$

P. Lawrence
  • 5,768
  • But that's my point: The textbook says "second-order approximation", so where did the $O(h^3)$ come from? – The Pointer Jan 25 '20 at 12:19
  • @ThePointer $O(h^3)$ is not part of the second-order approximation if that's what you're getting at. With $f(x^*+hy)$ on the left hand side, you cannot omit the $O(h^3)$ on the right hand side, as otherwise the two expressions are not equal – LinAlg Jan 28 '20 at 02:06
0

Here’s a bit more on local min and 2nd-/3rd order approximations in several variables, extending my first answer to one more derivative. First, for showing something is a ‘loose’ local min, i.e. $f(P_1) \ge f(P_0) $, knowing that a derivative of whatever order is non-negative, isn’t all that helpful, because being close to something non-negative is not a guarantee of being non-negative, whereas the situation is much better in the case of a strict local min, because being close enough to something positive guarantees being positive. In the case of function g of one variable, with g’(0)=0, we have $$g(1)=g(0)+\frac{1}{2!} g’’(0)+\frac{1}{3!}g’’’(t), 0<t<1 $$. As in the first answer, we’ll extend to a function f of n variables by taking two points $P_0$ and $P_1$ , defining $v=P_1-P_0=[v_1, … ,v_n]$=vector obtained by subtracting the coordinates of $P_0$ from those of $P_1$ and defining $g(t)=f(P_0+tv)$, which gives $$f(P_1)=f(P_0)+\frac{!}{2!}vHess(f(P_0))v^T+\frac{1}{3!}\sum_{1\le I,j,k \le n} f_{ijk}(P_0+tv)v_iv_jv_k,0<t<1$$ The problem with the last term, which we’ll call a cubic expression, the term ‘cubic form’ having another meaning in the literature, is that I haven’t been able to find much in the literature on such expressions, for which there are 3 possible explanations (i) my literature search has been inadequate (ii) not much can be done with such expressions (iii) the topic wouldd be a good area for research. In any event, it seems difficult to use this expression to determine if f has a local min at $P_0.$The only use I have found for such cubic expressions is in finding the torsion at a point on a space curve defined as the intersection of two surfaces $F(x,y,z)=0$ and $G(x,y,z)=0$,which seems not very closely related to local extrema.

P. Lawrence
  • 5,768
  • I don't think that either of your answers are correct. In fact, they're so unclear that I struggled to understand them and/or their relevance to my question at all. See my response to LinAlg's comment for the answer to my question. This is the relevant question: https://math.stackexchange.com/q/501735/356308 But I appreciate the effort you put towards answering my question; thank you for that. – The Pointer Jan 31 '20 at 15:16