maximum span between two numbers

Question

We have an array of n measurements: (m_0, m_1, ..., m_n-1). The measurements are voltages, but that's not important, so let's say they are just numbers. The measurements can be positive, negative or zero. There are no NaNs or infinites, only "regular" numbers. We are given a certain non-negative threshold.

We want to find two indices (i, j) in [0, n-1] such that:

j >= i
m_j >= m_i
m_j - m_i does not exceed threshold
j - i is maximum
if there are several pairs which satisfy the above conditions and for which j - i is equal, any of those pairs is OK

Examples:

the measurements are (-1.5, -3.6, 2.2, 0, 3.0), threshold = 3.8 => return (0, 3)
the measurements are (1.1, 0, -1.5, -1.7), threshold = 1 (or any number >= 0) => return (0, 0), or (1, 1), or (2, 2), or (3, 3)

We can easily come up with a brute force solution: for each i in [0, n-1], examine each j in [i+1, n-1], and remember the best answer. This is O(n^2). We currently have no more than tens of thousands of measurements, so O(n^2) is kind of OK, but the maximum number of measurements is expected to grow up to millions, perhaps tens of millions. Then, O(n^2) will no longer be OK.

We are wondering whether there's a better algorithm. I doubt one can do this in O(n), but perhaps in O(n*long(n)) by sorting the measurements in some clever way?

D.W. · Accepted Answer · 2023-01-31T07:59:27.240

I think you can solve this problem in $O(n \log n)$ time. I'll list two solutions.

Pareto optimality + binary search

In a preliminary stage, find the Pareto front, where a pair $(j,m_j)$ is better (dominates another pair) if $j$ is larger and $m_j$ is smaller. This can be done in $O(n \log n)$ time: see How do you compute the Pareto Front of a set?, minimum subset of dominating 2D points.

Put the Pareto optimal solutions in an array $P$. This array will be sorted by increasing $j$ and decreasing $m_j$.

Now repeat the following, for each $i := 0,1,\dots,n-1$:

Use binary search in $P$ to find the largest $j$ such that $j \ge i$ and $m_j \le m_i + \text{threshold}$. This can be done by binary search in $P$, by finding the location of index $i$ in $P$, then using binary search on the subsequent $m_j$ values.
If $j-i$ is larger than any previously remembered pair, remember the pair $i,j$.

Finally, output the best pair $i,j$ seen during this procedure (i.e., the one where $j-i$ is largest).

Each binary search takes $O(\log n)$ time, and you do $n$ of them, so the total running time is $O(n \log n)$.

Please check the details on this, to be sure whether it actually works. I have not checked it carefully.

Self-balancing binary search tree

Alternatively, store all the $(j,m_j)$ values in a self-balancing binary search tree, keyed on $m_j$, with each $(j,m_j)$ value stored in a leaf, so the leaves are sorted by the value of $m_j$. Augment this tree so that each internal node $x$ stores the maximum of all $m_j$ values in the leaves under $x$, and also stores the maximum of all $j$ values in the leaves under $x$.

Now, for $i=0$, let's find the largest $j$ such that $m_j-m_i \le \text{threshold}$. You can do this as follows: there is a set of $O(\log n)$ subtrees whose union contains all the leaves with $m_j$ values satisfying $m_j \le m_i + \text{threshold}$ (and no other leaves); use the augmentations in the parent nodes for those subtrees to find the largest value of $j$ among all of those subtrees; and output $j-i$.

Next, delete $m_i$ from the tree, set $i:=1$, and repeat.

In each step, we delete one element from the tree, increment $i$, and examine $O(\log n)$ subtrees.

Finally, we keep the best pair that was outputted (i.e., the pair with the largest value of $j-i$).

Since all operations on a self-balancing binary search tree can be done in $O(\log n)$ time, and you can find the $O(\log n)$ subtrees in $O(\log n)$ time, the total running time for this algorithm is $O(n \log n)$.

maximum span between two numbers

1 Answers1

Pareto optimality + binary search

Self-balancing binary search tree