3

I am working on a project and came across the following problem I have to solve.

Imagine we have a time series data Ts which is an array of pair, say each element is (Xi, Yi). The length of this array is constant N but is updated regularly. Each update consists of two steps,

  1. We remove Ts[0]
  2. Add a new pair (X, Y) to the end of Ts

Essentially like an online update to a sliding window of size N.

Now here comes the problem. At any given times (could be after an update or before), we are given a nonzero input integer A and B. The goal is to find such i between 0 and N-1 inclusive where the term (A-Xi)/(B-Yi) is maximized.

Another constraints I would like to mention here,

  1. (X0, X1, X2, ... X(N-1)) is non-decreasing, Xi is non-zero non-negative integer for all i.
  2. (Y0, Y1, Y2, ... Y(N-1)) is non-decreasing, Yi is non-zero non-negative integer for all i.
  3. A and B are both non-zero non-negative integer.
  4. For any query, we can safely assume B > Yi for every possible i. Technically for the original problem, B could be equal to the last Y(N-1). But this is a boundary condition and we would just eliminate N-1 in this case.
  5. We can also safely assume A > Xi for every possible i. Unlike constraint (4). It is not possible for A - Xi to be 0 in any case.

A naive approach would be to iterate through all N possibilities and easily find the solution (This is what I am doing). But I am thinking of a way to optimize this in the case of very large N. A query could come at any time for however many number of times, for example,

  1. Query, Update, Query (1000 times), Update. Or could also be,
  2. Query, Update, Query, Update, ...

Is there any room for optimization here? I am interested to see a proof from more mathematical standpoint if there is none as well.

1 Answers1

2

First, apply the standard reduction: to solve a ratio maximization problem (sometimes called "fractional programming"), binary search on the solution $t$:

$$ \max_x \frac{f(x)}{g(x)} \geq t \iff \max_x f(x) - t g(x) \geq 0 \\ \text{assuming } g(x) > 0 \; \forall x. $$

Applying that to this problem,

$$ \begin{align} \max_i (A - X_i) - t (B - Y_i) &= \max_i A - X_i - t B + t Y_i \\ &= \max_i (-X_i + t Y_i ) + (A - t B) \end{align} $$

so it is sufficient to maintain a data structure on $X, Y$ to answer queries of form $Q_{X,Y}(t) := \max_i (-X_i + t Y_i)$.

When data is considered as 2-D points $\{(X_i,Y_i)\}$, a query is a linear function maximization. It is possible to maintain a convex hull of dynamic points to support such query in $O(\log N)$ time using $O(N)$ space [1].

Let $M$ be the maximum integer value of input then we only need $O(\log M)$ iterations of the binary search. Therefore, the final runtime is $O(\log N \log M)$ per query, and $O(\log N)$ time per update.

(Time complexity doesn't change but it is possible to use a simpler incremental convex hull data structure by utilizing the sliding window update. It may also be possible to use the non-decreasing constraint (thinking of Convex hull trick) but I'm not sure.)

  • [1] Brodal, Gerth Stølting, and Riko Jacob. "Dynamic planar convex hull." The 43rd Annual IEEE Symposium on Foundations of Computer Science (2002).
pcpthm
  • 2,962
  • 6
  • 16