2

In fields such as game theory and reinforcement learning, it is standard to consider the regret-minimization strategy. I don't get the motivation for the definition.

Yes, doing your best under worst-case conditions (minimax) is an interesting guarantee. Yes, comparing to an optimal all-knowing player is interesting. But why is the regret metric interesting? I want to maximize my value, not to minimize my regret compared to an unachievable strategy that depends on my actions, and these two may be different.

I understand that in some settings there is no better alternative, but I want a positive argument in favor of regret minimization.

Amit Keinan
  • 171
  • 5

2 Answers2

3

I just did a Google search and read the Wikipedia page (https://en.wikipedia.org/wiki/Regret_(decision_theory)) and it seems to explain it fairly well. Regret metrics seem to only be defined for decisions under uncertainty. In such situations, it isn't clear what you mean by "I want to maximize my value." You could try to maximize expected value, but this requires knowing the probability of the various scenarios, or else estimating it.

The short answer to your question of why regret is an interesting metric is that it incorporates risk/risk aversion without having to explicitly quantify risk. In the minimax regret section of the wikipedia article they give an example and they compare the strategy of standard maximin and minimax regret. I think it is also useful to consider a third strategy of maximizing expectation, under the assumption that the three situations are equally likely. This third strategy suggests you should invest completely in stocks even when allowing mixed portfolios. I suggest that you also work out, for each investment strategy, what your return would be in each scenario. The maximin strategy will produce the most even outcomes across the three scenarios, and the maximum expectation will produce the most varied outcome. Minimaxing regret ends up between these two extremes.

NaturalLogZ
  • 991
  • 5
  • 11
3

An illustrative example for why minimzing regret is interesting is the Traveler's Dilemma:

An airline lost the luggage of two of their passengers. Both passengers happen to have identical luggage and the airline know it is worth some value between $2$ and $100$ USD which they now have to reimburse. To find out the value of the luggage they ask both passengers $1, 2$ for their number $a, b$, respectively. Then they give a reimbursement of the lower number (because the airline will now claim that this is the correct one) but give the passenger who reported the lower number a $2$ USD honesty reward, while the passenger who reported the higher number gets a $2$ USD penalty (mind: this penalty is also from the lower reported number). If both passengers report the same number, then both receive exactly that amount.
Thus, we have the following utility functions $$u_1(a, b) = \begin{cases}a+2 & \text{if } a < b,\\a & \text{if } a = b,\\b-2 & \text{if } a > b,\end{cases}$$ and $u_2(a, b) = u_1(b, a)$.

Now, if both player play cooperatively, they could just both report the number $100$ and receive that amount. This is clearly a welfare-maximizing strategy. However, maximizing utility is not something one can just do in game theory, as you have multiple agents whose decision are not yours but influence what is possible for you. That is, the state $(100, 100)$ is not stable (i.e. not a Nash equilibrium) as both players have an incentive to deviate: If $1$ knows that $2$ is reporting $100$, the $1$'s best response is to report $99$ to receive $99+2=101$ instead of just $100$. Iterating this reasoning yields that the (unique) Nash equilibrium is $(2, 2)$, which is indeed the welfare-minimizing state.

Certainly, this is not really a rational outcome to this game. In some sense, a regret minimizing player is rather looking for a more robust solution against irrational play.

Given a strategic game $(N, (S_i)_{i \in N}, (u_i)_{i \in N})$, we denote the maximum utility that a player $i \in N$ can achieve against a profile $s_{-i}$ by $$u_i^\ast(s_{-i}) := \max_{s_i \in S_i} u_i(s_i, s_{-i}).$$ We define the regret that player $i$ experiences for playing $s_i$ against $s_{-i}$ by $$\mathsf{regret}_{u_i}(s_i, s_{-i}) := u_i^\ast(s_{-i}) - u_i(s_i, s_{-i}).$$ The maximum regret that a player can experience by playing a strategy $s_i \in S_i$ is $$\mathsf{maxreg}_{u_i}(s_i) := \max_{s_{-i} \in S_{-i}} \mathsf{regret}_{u_i}(s_i, s_{-i}).$$ A regret minimizing player is a player who aims to minimize the maximum regret, i.e. she wants to find a strategy $s_i \in S_i$ such that $\mathsf{maxreg}_{u_i}(s_i)$ is minimized.

Applying these definitions, will yield that the strategies $\{96, \ldots, 100\}$ will minimze the maximum regret. To most people, these number should seem to be much more reasonable strategies for the Traveler's Dilemma than the Nash equilibrium.

Moreover, if we restrict the strategy space to these numbers, then $97$ will be the only strategy that survives a procedure one may call iterated regret minimization, presented in [Halpern, Pass (2012)] in which you can also find the calculations for the example above (it is also instructive and not too difficult to do it yourself, however).

ttnick
  • 2,034
  • 10
  • 19