1

The geometric mean for some width $w$ is the $w$-th root of the product of the $w$ most recent data points.

$$ M_t = \sqrt[w]{x_t \otimes x_{t-1} \otimes \cdots \otimes x_{t-(w-1)} } $$

This can be calculated in a rolling fashion by dividing out factors as they exit and multiplying in factors as they arrive:

$$ M'_t = M'_{t-1} \otimes \sqrt[w]{x_{t} \oslash x_{t-w}} $$

(where $\otimes$ is IEEE-754 floating point multiplication and $\oslash$ is IEEE-754 floating point division).

However, due to the limitations of IEEE-754 floating point precision, $M'_t$ can drift arbitrarily far from $M_t$ over time. Is it possible to correct for this drift such that $M'_t$ stays within a fixed relative bound of $M_t$ indefinitely in better than $O(w)$ per increment, and if so, how?

(Obviously, in practice, $w$ will be reasonably small, and it's necessary to keep a buffer of the $w$ most recent data points anyway, so it'll be simpler and likely faster just to recompute the proper geometric mean each time. But I'm interested in the math and process of a “proper” rolling solution.)

CAD97
  • 113
  • 4

2 Answers2

1

First you obviously need to scale all your numbers so you don't get overflow or underflow. Of course if your numbers are limited in exponent range then this might be guaranteed; like the product of 20 numbers up to $10^{12}$ will never overflow. You might scale by a factor $2^{kw}$ and multiply the result by $2^k$ with no rounding error at all.

Now the trick would be that your calculation must never depend on too many inputs. So you calculate the product of the first w numbers. Then you calculate the first w products as you did, while simultaneously forming the product of the next w products. Then you replace your product against the freshly calculated product, and after every w numbers you do the same thing. So your product is always the product of w numbers, multiplied by up to w quotients of numbers. So if you have a billion numbers, no problem.

gnasher729
  • 32,238
  • 36
  • 56
0

This is just a random thought, but it may be worth separating out the mantissa and exponent, and keeping track of them separately.

Denote:

$$x_i = m_i \times 2^{e_i}$$

Then:

$$\sqrt[n]{\prod_i x_i} = \sqrt[n]{\prod_i m_i} \times 2^{\left(\sum_i e_i / n\right)}$$

The convention of the C (and C++) function frexp is that it decomposes a floating point number in such a way that its mantissa is in the range $[\frac{1}{2},1)$ rather than the more usual convention of putting the mantissa in the range $[1,2)$.

Using this convention ensures that all of the multiplications and root extractions leave all of the working calculations in the "sweet spot" for IEEE-754. It is a fun and under-appreciated fact that over half of all IEEE-754 numbers are in the range $[-1,1]$.

Pseudonym
  • 24,523
  • 3
  • 48
  • 99