8

Let's approximate $\log x$ on the interval $(0,1)$ by a power function $a x^b$ to minimize the integral of the squared difference $$\delta_0(a,b)=\int_0^1\left(\log x-a x^b\right)^2dx.\tag1$$ It's easy to verify that the minimum is attained at $a_0=-\frac34,\,b_0=-\frac13$ that gives the approximation $$\log x=-\tfrac34x^{-1/3}+\mathcal R(x),\tag2$$ where $\mathcal R(x)$ is the error term. Now, let's again approximate $\mathcal R(x)$ by a power function $a x^b$ to minimize $$\delta_1(a,b)=\int_0^1\left(\mathcal R(x)-a x^b\right)^2dx=\int_0^1\left(\log x-\left(-\tfrac34x^{-1/3}+a x^b\right)\right)^2dx.\tag3$$ The minimum is attained at $$\begin{align}a_1&=\frac{17}4-\sqrt{58} \sin \left[\frac13 \arctan \left(\frac{433}{33\sqrt7}\right)\right]\approx0.88760008404...,\\b_1&=\frac{1}{3}+\frac{4}{3} \sqrt{2} \cos \left[\frac{1}{3} \arctan\left(\frac{\sqrt{7}}{11}\right)\right]\approx2.21311796239...,\end{align}\tag4$$ which are algebraic numbers of degree $3^\dagger$. If we repeat this process once again, we will get the next term $a_2x^{b_2}$, where $$a_2\approx-0.1406322691...,\, b_2\approx-0.2430593194...\tag5$$ are algebraic numbers of degree $15^\ddagger$, for which I don't know any closed form. The following steps will similarly produce pairs of algebraic numbers of higher degrees, resulting in an approximation of $\log x$ on the interval $(0,1)$ by a generalized power series $$\log x\approx-\tfrac34x^{-1/3}+a_1x^{b_1}+a_2x^{b_2}+\dots,\tag6$$ where each next term causes the integral of the squared error term to progressively decrease. The powers $b_n$ and coefficients $a_n$ are not monotone and do not exhibit any clear pattern (although the coefficients generally tend to decrease in absolute value, with some sporadic spikes).


Question: What does the series $(6)$ converge to? Does it converge to $\log x$ on any interval?


If it does converge to $\log x$, then empirically the convergence appears to be quite slow and erratic. log x and its approximation


${^\dagger}$ The corresponding minimal polynomials are $\small64 z^3-816 z^2+684 z-9$ and $\small9 z^3-9 z^2-21 z-7.$
${^\ddagger}$ The corresponding minimal polynomials are $\small5035261952 z^{15}+180729937920 z^{14}+19190513664 z^{13}-60948402536448 z^{12}-383744783499264 z^{11}+6281308897579008 z^{10}+50474690060451840 z^9-155303784466089984 z^8-1906255797863421024 z^7+805421030545306296 z^6+670389754270702752 z^5+127003127714790264 z^4+8514399973766202 z^3+130643635592430 z^2-127629387774 z-79827687$
and
$\small118098 z^{15}-1299078 z^{14}-15628302 z^{13}-52936335 z^{12}-55068660 z^{11}+119832291 z^{10}+512627130 z^9+898647291 z^8+984822786 z^7+742152591 z^6+396538632 z^5+150470676 z^4+39697272 z^3+6920496 z^2+715716 z+33172.$
Although the polynomials look scary, they are quite nice in some sense, e.g. $\small5035261952=2^{21}\cdot7^4$ and $\small79827687=3^8\cdot23^3,$ and they also can be factored into quintics over some $\mathbb Q[q]$ with $q$ expressible in radicals.

  • 3
    I guess it converges to $\log(x)$ in $L^2((0, 1))$ since even just the polynomials are dense in $L^2((0, 1))$. But to prove this you would need to connect your series expansion with orthogonal projections onto subspaces of the span of the power functions. – Mason Jun 02 '24 at 01:12
  • 1
    Although a series of this form can converge to $\log x$, it doesn't obviously mean that this particular procedure will produce a series that converges to it, right? – Vladimir Reshetnikov Jun 02 '24 at 19:30
  • It could be interesting to post this problem in https://stats.stackexchange.com/ to see how they would react – Claude Leibovici Jun 03 '24 at 09:37
  • @ClaudeLeibovici How is it related to statistics? – Vladimir Reshetnikov Jun 03 '24 at 19:08
  • Frankly speaking, fitting residuals after each step does not make me very comfortable. – Claude Leibovici Jun 04 '24 at 06:05
  • @ClaudeLeibovici Seems quite natural to me. This is how the Fourier series works, for example. – Vladimir Reshetnikov Jun 06 '24 at 16:27
  • Maybe you could get some ideas from this question I did a few months ago – Joako Jun 08 '24 at 08:20
  • @ClaudeLeibovici "fitting residuals after each step does not make me very comfortable" For example, in two terms approximation, let $f(x) = \ln x, g(x) = -\frac34 x^{-1/3} + a_1 x^{b_1}$, and $h(x) = \frac{57344}{7225}(x^{1/16} - x^{-1/16})$. We have $I_1 = \int_0^1 (f(x) - g(x))^2,\mathrm{d} x \approx 0.1673$, and $I_2 = \int_0^1 (f(x) - h(x))^2,\mathrm{d} x = 87154/469805625 \approx 0.0001855$. My $h(x)$ is related to $\int_0^1 (\ln x - (ax^b + cx^d))^2,\mathrm{d} x$. – River Li Jun 08 '24 at 12:54
  • @ClaudeLeibovici So fitting residuals after each step in this example (two terms approximation) seems not good as fitting $\ln(x)$ by $ax^b + cx^d$. – River Li Jun 08 '24 at 14:23
  • Are you looking for any approximation by $ax^b$ or more insights into the question’s? – Тyma Gaidash Jun 08 '24 at 18:33
  • Don't know if that may help, but $\delta_0(a,b)$ is the expectation of the uniform r.v. on $(0,1)$ i.e. $\delta_0(a,b)=\mathbf{E}(log(X)-aX^b)^2$. Some simple upper and lower bounds can be derived. Since $\mathbf{Var}(log(X)-aX^b) \ge 0$, then $\delta_0(a,b) \ge (\mathbf{E}log(X)- \mathbf{E}aX^b)^2=(-log(2)-{a \over b+1})^2$. – rrv Jun 12 '24 at 17:03

1 Answers1

4

If we suppose we have already computed the first $a_0, \ldots, a_{n-1}$ and $b_0, \ldots, b_{n-1} > -1$, then we can derive the formula for $a_n$ and $b_n$ in terms of the preceding values. To that end, we first we compute the distance function with the additional term \begin{align} \delta &= \int_0^1 \left(\log x - \sum_{i=0}^n a_i x^{b_i} \right)^2 \text dx \\ &= \int_0^1 (\log x)^2 \text dx -2 \int_0^1 \log x \sum_{i=0}^n a_i x^{b_i} \text dx + \int_0^1\left(\sum_{i=0}^n a_i x^{b_i} \right)^2 \text dx \\ &= 2 -2 \sum_{i=0}^n a_i \int_0^1 \log(x) x^{b_i} \text dx + \sum_{i,j=0}^n a_i a_j \int_0^1x^{b_i + b_j} \text dx \\ &= 2 -2 \sum_{i=0}^n a_i \left(\underbrace{\left[\log(x) \frac{x^{b_i+1}}{b_i+1}\right]_0^1}_{=0} - \int_0^1 x^{-1} \frac{x^{b_i+1}}{b_i+1} \text dx \right) + \sum_{i,j=0}^n \frac{a_i a_j}{b_i+b_j+1} \\ &= 2 +2 \sum_{i=0}^n \frac{a_i}{(b_i+1)^2} + 2\sum_{i=0}^n \sum_{j=0}^{i-1} \frac{a_i a_j}{b_i+b_j+1} + \sum_{i=0}^n \frac{a_i^2}{2b_i+1}. \end{align} Then we compute the gradient \begin{align} \frac{\text d\delta}{\text da_n} = \frac{2}{(b_n+1)^2} + 2\sum_{j=0}^{n-1} \frac{a_j}{b_n+b_j+1} + 2 \frac{a_n}{2b_n+1}, \end{align} \begin{align} \frac{\text d\delta}{\text db_n} = -4\frac{a_n}{(b_n+1)^3} - 2a_n\sum_{j=0}^{n-1} \frac{a_j}{(b_n+b_j+1)^2} - 2 \frac{a_n^2}{(2b_n+1)^2}. \end{align} Equating the second equation with $0$ gives $$ a_n = (2b_n+1)^2 \left( -\frac{2}{(b_n+1)^3} - \sum_{j=0}^{n-1} \frac{a_j}{(b_n+b_j+1)^2} \right) $$ Which we can plug in the first equation equated with $0$ (and divided by $2$) \begin{align} 0 &=\frac{1}{(b_n+1)^2} + \sum_{j=0}^{n-1} \frac{a_j}{b_n+b_j+1} + (2b_n+1) \left( -\frac{2}{(b_n+1)^3} - \sum_{j=0}^{n-1} \frac{a_j}{(b_n+b_j+1)^2} \right) \\ &= \frac{b_n+1 - 2(2b_n+1)}{(b_n+1)^3} + \sum_{j=0}^{n-1} \frac{a_j(b_n+b_j+1) - (2b_n+1)a_j}{(b_n+b_j+1)^2} \\ &= -\frac{3b_n+1}{(b_n+1)^3} + \sum_{j=0}^{n-1} \frac{a_j(b_j-b_n)}{(b_n+b_j+1)^2}. \end{align} This equation can of course be transformed into an equivalent polynomial equation in $b_n$. To find $b_n$, we need to solve the polynomial equation and then find the actual real root $> -1$ that minimizes the function, with the determinant of the hessian matrix criterion, for example.


For $n=0$, the condition $\frac{3b_0+1}{(b_0+1)^3} = 0$ gives $b_0 = -1/3$. Then $$ a_0 = -2\frac{(2b_0 + 1)^2}{(b_0+1)^3} = -\frac{3}{4}. $$ For $n=1$, $b_1$ satisfies $$ \frac{3b_1+1}{(b_1+1)^3} = \frac{a_0(b_0 - b_1)}{(b_0 + b_1 + 1)^2} \iff 27 b_1^4 - 18 b_1^3 - 72 b_1^2 - 42 b_1 - 7 = 0. $$ This gives four possible candidates for $b_1$ among $\{-1/3, -0.735.., -0.478.., 2.21.. \}$.

Nolord
  • 2,233
  • I did not go intro thé détails of pour answer but this is not what thé OP wants to do. Could you show the résults of pour approach for $n=2$ or $n=3$ ? Thanks – Claude Leibovici Jun 08 '24 at 13:48
  • Quoting the bounty message "I don't necessarily expect a complete answer. Any interesting insights are welcome." – Nolord Jun 08 '24 at 15:21
  • I think that I misexplained my point. What you do is exactly what I had in mind that is to say minimize the norm for all parameters at the time. What the OP seems to have in mind is to find the parameters $(a_i,,b_i)$ one at the time. – Claude Leibovici Jun 09 '24 at 05:15
  • Maybe I misexplained what I was doing as well, I'll edit. – Nolord Jun 09 '24 at 06:54