4

Let $\sigma_0, \sigma_1, \sigma_2, \dots$ be a sequence in $\{-1,+1\}$ and $T \in \mathbb{N}$ a time horizon.

Consider the following game. At each time step, we're asked if we want to give an answer $X_t \in \{-1,1\}$, or to abstain and skip to the next turn. If at time $t$ we decide to give an answer $X_t \in \{-1,1\}$, if $k(t)$ is the number of answers we have given just before time $t$, we're going to observe whether $X_t = \sigma_{k(t)}$ or not. If they are equal, the game proceeds further. If not, the game stops. To help us determine the signs $\sigma_k$, at each time step $t$, we're going to observe a random variable $Z_t$ which, conditioned to the history we've observed so far, is a $1$-subgaussian random variable with mean $\sigma_{k(t)} \cdot 2^{-k(t)}$ (recall that $k(t)$ is the number of answers we have given just before time $t$). Intuitively, the absolute values of the means of these random variables shrink geometrically as we give more and more correct answers, but their signs depend on the unknown quantities $\sigma_0, \sigma_1, \sigma_2,\dots$.

Our losses will be measured in the following way. If at some point we have erroneously guessed $X_t \neq \sigma_{k(t)}$, we directly pay $T$. If not, and $t_1,\dots,t_K \le T$ are the turns before the time horizon $T$ where we have answered questions, what we pay is $$ t_1 + \sum_{k=2}^K (t_k-t_{k-1}) \cdot 2^{1-k} + (T-t_K) \cdot 2^{-K} \;.$$ In words, when we don't lose, we pay the cumulative sum of the absolute values of the means of the 1-subgaussian random variables we have seen.

I'm wondering if we can devise a sequential strategy that on the basis of the observed $Z_1,Z_2,\dots$ allows us to pay in expectation just $O(\sqrt{T})$, regardless on how the $\sigma_0,\sigma_1, \sigma_2, \dots$ were chosen.

I tried to devise a strategy based on confidence intervals. Specifically, fix $\delta >0$. We are going to provide answers at fixed times, specifically at those times inductively defined by $$t_k := t_{k-1} + \bigg\lceil c\cdot 4^{k-1} \log\Big(\frac{2}{\delta}\Big) \bigg\rceil$$ where by definition $t_0 := 0$ and $c$ is a universal constant whose role will be explained below. Now, notice that if the game has gone on up to time $t_k$, then at that time we have seen $\bigg\lceil c \cdot 4^{k-1} \log\Big(\frac{2}{\delta}\Big) \bigg\rceil$ realizations of 1-subgaussian random variables with mean $\sigma_{k-1} \cdot 2^{1-k}$. In any case, we predict according to $$X_{t_k} :=\operatorname{sgn} \bigg( \sum_{t=t_{k-1}+1}^{t_k} Z_t \bigg).$$ The reason why we do so is that, for a suitable universal choice of $c$, the open $\delta$-confidence interval around the above empirical mean at time $t_k$ will contain $\sigma_{k-1}\cdot 2^{1-k}$ with probability at least $1-\delta$ and its radius is (upper bounded by) $2^{1-k}$. It follows that using the previous strategy, when $\sigma_k = 1$, the empirical mean will be greater than $0$ with probability $1-\delta$, and when $\sigma_k = -1$ then the empirical mean will be less than $0$ with probability $1-\delta$ (hope not to have done any mistake in the calculations, but I think that the procedure should be sound). Using a union bound, we will make a mistake with a probability upper bounded by $\delta \cdot K$, where $K \approx \log_4\left(\frac{T}{c \cdot \log\left(\frac{2}{\delta}\right)}\right)$. On the other hand, when we do not make any mistake, using this expression for $K$, we're going to pay $$ t_1 + \sum_{k=2}^K (t_k-t_{k-1}) \cdot 2^{1-k} + (T-t_K) \cdot 2^{-K} = O \bigg( \sqrt{T \log\Big(\frac{2}{\delta}\Big)}\bigg) \;.$$

Overall, in expectation, we pay at most $O \bigg( \sqrt{T \log\Big(\frac{2}{\delta}\Big)} + \delta \cdot \log_4\left(\frac{T}{c \cdot \log\left(\frac{2}{\delta}\right)}\right) \cdot T\bigg)$, which, picking $\delta = \frac{1}{\sqrt{T}}$ leads to an expected loss that is at most $O(\sqrt{T})$ up to logaritmic factors in $T$.

And here is the question: can we do better removing these extra logaritmic factors, maybe using a less naive strategy, to achieve the $O(\sqrt{T})$ rate?

Bob
  • 5,995

1 Answers1

1

I’m afraid this is not possible.

Assume that the expected payment can be limited to $a\sqrt T$ for some constant $a$. We pay at least $T2^{-K}$, so we must have $T2^{-K}\le a\sqrt T$, and thus $K\ge\log_2\frac{\sqrt T}a$. Since failure results in a payment of $T$, the overall failure probability, and thus also the failure probability for each answer provided, is bounded by $\frac a{\sqrt T}$. If we provide answers at fixed times, as in your strategy, then the number of observations with mean $2^{-(K-1)}$ that we have to accumulate to limit the failure probability of the last answer to $\frac a{\sqrt T}$ is approximately $4^{K-1}\log\frac{\sqrt T}a$. But for those observations we have to pay

$$ 2^{-(K-1)}\cdot4^{K-1}\log\frac{\sqrt T}a=2^{K-1}\log\frac{\sqrt T}a\ge\frac12\frac{\sqrt T}a\log\frac{\sqrt T}a\;, $$

which is $O\left(\sqrt T\log T\right)$.

So $O\left(\sqrt T\right)$ isn’t possible with fixed times. We can certainly do a bit better than fixed times, since we can guess earlier if the observations imply a low failure probability. I would expect that the optimal strategy for limiting the failure probability on the last guess to $\frac a{\sqrt T}$ nevertheless costs $O\left(\sqrt T\log T\right)$, but I don’t know how to prove that. What I can prove, though, is that it costs more than $O\left(\sqrt T\right)$.

Assume we have an optimal strategy for making the $(k+1)$-th guess such that the failure probability is limited to $p$, and it takes an expected number $b_{k+1}$ of observations. Group the observations in groups of $4$ and delay any guesses that the strategy prescribes within the groups until the end of their group. This costs at most $4$ additional observations. At the end of each group, we have more information than we would have had for making the delayed guesses, so we can guess at least as well as we would have without the delay. Now, since the sum of the observations is a sufficient statistic for the unknown parameter $\sigma_k$, we can discard the information in the individual observations and only retain the sum of the $4$ observations for each group. But if the observed random variable is normally distributed with variance $1$, those sums are distributed like the observations for the $k$-th guess (just scaled by $2$). (Since the strategy must work for all $1$-subgaussian random variables, it has to work in this case in particular.)

So this yields a strategy for making the $k$-th guess after $b_k$ observations, with $b_k\le\frac{b_{k+1}+4}4=\frac14b_{k+1}+1$, that also limits the failure probability to $p$. Going in the other direction, it follows that $b_{k+1}\ge4(b_k-1)$, and then solving the recurrence and using induction yields $b_k\ge d(p)\cdot4^k+\frac43$ for some constant $d(p)$ that depends on $p$ but not on $k$.

What this shows is that we can’t make use of the increasing number of observations with increasing $k$ to devise a clever guessing strategy that scales with less than $4^k$. Limiting the failure probability to $p$ requires $4^kd(p)$ observations for the $k$-th answer. It’s clear that $d(p)$ increases with decreasing $p$, and thus the cost of the observations for the last guess, which is at least

$$ 2^{-(K-1)}d\left(\frac a{\sqrt T}\right)4^K=2^{K+1}d\left(\frac a{\sqrt T}\right)\ge2\frac{\sqrt T}ad\left(\frac a{\sqrt T}\right)\;, $$

grows faster than $\sqrt T$.

joriki
  • 242,601
  • Another perspective on the "groups of 4" construction is that in guessing the $k$-th value we can simulate the strategy for guessing the $(k+1)$-th value by generating for each observed value four "intermediate" values that sum to the observation and applying the strategy for the $(k+1)$-th value to the resulting sequence of simulated observations – again, we pay for at most $1$ additional observation on account of only being able to guess at the end of each group. – joriki Jan 26 '23 at 10:26