Explicit formula for tournament sequence

Question

I am looking for an explicit formula for a sequence. The sequence is generated as follows:

There is a tournament with $10$ teams. In the beginning, all teams have a 0-0 win-loss record. The teams are paired up, and when the first round concludes, $5$ of the teams have a 0-1 record and $5$ of the teams have a 1-0 record. The second round is arranged so that only teams with the same win-loss record are allowed to face each other, and when there are an odd number of teams having a certain win-loss record, the extra team from one win-loss record plays the extra team from the nearby win-loss record. This means that for the second round, four of the 0-1 teams are paired up with each other, $4$ of the 1-0 teams are paired up with each other, and there is one pairing between a 0-1 team and a 1-0 team. We also assume that when there is a pairing across different win-loss records, the team with the better record always wins. This means that at the end of round 2, there will be $3$ teams with a 0-2 record, there will be $4$ teams with a 1-1 record, and $3$ teams with a 2-0 record.

I am looking for an explicit formula giving the win-loss record distribution at round $r$. I am also looking for a general formula for $T$ teams (I used "10" earlier just for illustration purposes).

I have python code which explicitly calculates this sequence for $T$ teams and round $r$, but I would like a mathematical expression for this.

I have an explicit formula for the first $4$ rows using ceiling/floor functions, but the formula does not work beyond $4$ rows. I also have an explicit formula giving the win-loss record in the long term, but it only works after about $T^2$ rows.

In terms of problem solving techniques that I've tried, I've tried various generating series approaches. For example, for the case of $10$ teams, I wrote round $0$ as

$$f_0(x,y) = y^0 + y^1 + y^2 + ... + y^9$$

and round $1$ as

$$f_1(x,y) = y^0 + y^1 + y^2 + y^3 + y^ 4 + (y^5 + y^6 + y^7 + y^8 + y^9)x$$

and round $2$ as

$$f_2(x,y) = y^0 + y^1 + y^2 + (y^3 + y^4 + y^5 + y^6)x + (y^7 + y^8 + y^9)x^2$$

and so on. I don't have a method for moving from $f_k$ to $f_{k+1}$, but I have lots of near misses.

I have also tried using a probabilistic approach. I have started from several invariants, and I have a function that when minimized gives a (sometimes) close approximation. The invariants are as follows:

The number of teams $T$ does not change from round to round.
The number of total wins at each round is always $T r / 2$.
The win-loss record is symmetric around $r/2$.

These invariants restrict the number of possible choices for round $r$. I then took the choice that most nearly matches a binomial distribution (using the euclidean distance as a metric, although any similarity metric could work). This metric isn't best, since I believe the win-loss records become "less binomial" over time.

Another thing I tried was various kinds of graphing. I can post some graphs if requested.

I am open to any approach, I look forward to seeing your ideas.

Updates and extra information

Here are the first few terms of the sequence for $T=10$:

$$10$$

$$5, 5$$

$$3, 4, 3$$

$$2, 3, 3, 2$$

$$1, 3, 2, 3, 1$$

$$1, 1, 3, 3, 1, 1$$

$$1, 0, 3, 2, 3, 0, 1$$

$$1, 0, 1, 3, 3, 1, 0, 1$$

$$1, 0, 0, 3, 2, 3, 0, 0, 1$$

$$1, 0, 0, 1, 3, 3, 1, 0, 0, 1$$

$$...$$

Another clarification is that the number of teams $T$ must be even.

Also, I would like to add the formula for the long-term behavior of the sequences. The long-term behavior shows a 2-term repetition, and this repetition depends on the value of $T \pmod 8$. Here are the four cases:

Case 1. ($T \equiv 0 \pmod 8$): $1444...444...441$, $3444...4444....443$

Case 2. ($T \equiv 2 \pmod 8$): $3444...424...443$, $1444...4334....441$

Case 3. ($T \equiv 4 \pmod 8$): $3444...444...443$, $1444...4444....441$

Case 4. ($T \equiv 6 \pmod{8}$): $1444...424...441$, $3444...4334....443$

Note that I have left off the preceding $1000...$ and the trailing $...0001$, since the number of zeros can be easily calculated. Not also that the number of $4$s can be calculated easily from the original number of teams $T$.

Regarding the generating series idea previously, one thing I found is that if we have for example

$$f_2(x,y) = y^0 + y^1 + y^2 + (y^3 + y^4 + y^5 + y^6)x + (y^7 + y^8 + y^9)x^2$$

then we can calculate something very close to $f_3(x,y)$ by performing the following steps:

$$\frac{f_2(x,y) + f_2(x,-y)}{2} = y^0 + y^2 + (y^4 + y^6)x + (y^8)x^2$$
$$\frac{f_2(x,y) - f_2(x,-y)}{2} \cdot x = y^1 x + (y^3 + y^5) x^2 + (y^7 + y^9) x^3$$
$$\widetilde{f}_3(x,y) = \frac{f_2(x,y) + f_2(x,-y)}{2} + \frac{f_2(x,y) - f_2(x,-y)}{2} \cdot x = y^0 + y^2 + (y^1 + y^4 + y^6)x + (y^3 + y^5 + y^8)x^2 + (y^7 + y^9)x^3$$

which when evaluated at $y = 1$ gives the correct sequence $2,3,3,2$. It is not exactly correct, since the $y$ terms are no longer ordered in terms of exponents, so this process is not repeatable unless we improve on it somehow.

Another Update

Let's add a variant to the original rules. The variant will be called "underdog", whereas the original rules will be called "overdog". The "overdog" rules stipulate that the team with the higher win-loss record will always win when it crosses over and faces a team with a lower win-loss ratio; in contrast, the "underdog" rules stipulate that the team with the lower win-loss ratio will win.

Let's call the overdog sequence with $T$ teams the sequence $O(T)$, and each term in the sequence will be identified as $O_{r,w}(T)$ where $r$ denotes the round and $w$ denotes the number of wins at that round. For the underdog variant, we define $U(T)$ and $U_{r,w}(T)$ in the same way.

I just discovered the following properties:

$$O(2^k + 2) = U(2^k) + O(2)$$

$$O(2^k + 4) = U(2^k) + U(2) + O(2)$$

This is helpful since $U(2^k)$ and $O(2^k)$ are easy to calculate for $k$ rows, and $O(2)$ and $U(2)$ are easily calculated for all rows.

I'm wondering if we can use the binary representation of $T$ and some kind of summing procedure to improve the number of rows that we can directly calculate. It would be fascinating to try to reduce $O(T)$ and $U(T)$ to the sum of terms of the form $U(2^K)$ or $O(2^K)$ for any arbitrarily large $K >> 0$. This would give us a way to calculate arbitrarily many rows for any $T$.

Correction to Previous Update

I should edit my previous update to be more precise. The powers of two were a bit of a red herring — all that matters is $T \pmod{4}$. Here are the updated relations:

$O(4k + 2) = U(4k) + O(2)$.
$U(4k + 2) = U(4k) + U(2)$.
$O(4k + 4) = U(4k) + U(2) + O(2)$.

This means that the problem is essentially reduced to finding $U(4k)$ instead of finding $U(2k)$. Not much of a reduction, but it's progress!

Your jumped pairing rule messes up everything... It can jump between groups i.e. odd-even-even-odd-even-even-odd. I don't think such closed formula exists. The first goal is to find the $f_k \to f_{k+1}$ inductive step including your paining rule. Without that, the closed formula is only a dream. Better, change your pairing rule to something more... algebraic. — Brethlosze, Feb 28 '23 at 00:12
@Brethlosze I agree, the jumped pairing rule makes the problem very difficult. What would be a more algebraic rule here? I am open to modifying the problem accordingly, since originally I had made the jumped pairing probabilistic (50% chance of the "underdog" winning) so that the win-loss record becomes a probability distribution at round $r$ for $T$ teams. Let me know if you have ideas. — Jackson, Feb 28 '23 at 00:37
The jump-pairing has no impact (If I understand correctly what you call Jump pairing). We have 4 teams $B,C$ with score 10 and 1 team $A$ with score 11 and 1 team $D$ with score 9. You can decide that next round will be $A$ against $D$ and $B$ against $C$ (associate teams with same score first), or $A$ against $B$ and $C$ against $D$, final result will be in both cases $A$ and ($B$ or $C$) winning, and $D$ and ($B$ or $C$) loosing. — Lourrran, Feb 28 '23 at 07:53
@Lourrran Very interesting idea! So in the case of something like $3, 4, 2, 4, 3, 5, 2, 2, 4, 7$, you're saying that there are two odd blocks ($3, ..., 3$ and $ 5, ..., 7$) such that head of an odd block is "in communication with" the tail of that same odd block bypassing all the even terms in between, simplifying the construction of the next row: is this correct? I will play around with this idea in my handwritten notes. — Jackson, Feb 28 '23 at 13:44
Could you please clarify the definition of "nearby win-loss record" in "when there are an odd number of teams having a certain win-loss record, the extra team from one win-loss record plays the extra team from the nearby win-loss record"? How do you handle the situation when there are multiple teams at the same distance (e.g one team with one more win and a second team with one less win). — Steven Clark, Mar 07 '23 at 02:08
@StevenClark Sure, here's an example. We have a win distribution of $2,3,5,3,2,1$: this means $2$ teams have $0$ wins, ..., up to $1$ team with $6$ wins. Here's how we match teams. The $2$ teams with $0$ wins are paired. $2$ of the $3$ teams with $1$ win are paired, and the remaining $1$ team with $3$ wins is paired with $1$ team with $2$ wins. The $4$ remaining teams with $2$ wins are paired. $2$ of the $3$ teams with $3$ wins are paired. The remaining $1$ team with $3$ wins is paired with $1$ team with $4$ wins. The remaining $1$ team with $4$ wins is paired with the $1$ team with $5$ wins. — Jackson, Mar 07 '23 at 18:33
@StevenClark I forgot to mention that you get the same outcome if start pairing from left-to-right (as in the example) or from right-to-left. — Jackson, Mar 07 '23 at 18:35

score 0 · Answer 1 · answered Mar 07 '23 at 22:03

(Not a solution. Too long to be a comment).

Let $f_n(x)$ be the generating function for performance after $n$ rounds.
We start with $f_0 = nx^0$, as all $n$ teams have won 0 matches.
Let $g_n(x) $ be $f_n(x)$ evaluated in $\mathbb{F}_2[x]$.
Let $H$ be a function that takes a polynomial with coefficients $\{0, 1\}$, an even number of them being 1, and returns:

Start with the term with highest exponent, multiply that term by $x$.
Multiply the term with the next highest exponent by $1$.
Multiply the term with the next highest exponent by $x$.
Multiply the term with the next highest exponent by $1$.
So on and so forth till we've gone through all the coefficients.

As an explicit example, $H(1+x^2+x^3+x^6) = 1 + 2x^3 + x^7$.
In a sense, $H(p(x)) \sim \frac{1+x}{2} p(x)$, but then we have bunch the coefficients so that they become integers.

With those definitions,

$$f_{n+1} (x) = \left(\frac{1+x}{2}\right) [f_n(x) - g_n(x) ] + H[g_n(x)].$$

The issue is the I don't know of a way to evaluate $H[g_n(x)]$, in order to attempt to find a closed form.

This gives us $f_n(x) \sim n(\frac{1+x}{2})^n$, which collaborates OP's observation that "it seems to be a binomial distribution, but gets less binomial over time".

In fact, if we let $H$ act on all polynomials by "multiplying by $(1+x)/2$ and then bunching coefficients accordingly", then $$f_{n+1}(x) = H[f_n(x)].$$ The above just reflects that $H$ acts very nicely on even coefficients.

Thanks for your post/comment, I'll try to see if I can make progress further in this generating function direction — Jackson, Mar 08 '23 at 17:06
Perhaps the function $H$ we're looking for is \begin{align} H_k[g(x)] = x^{1-k}\frac{g(x) + g(-x)}{2} + x^{k}\frac{g(x) - g(-x)}{2} \end{align} where $k = \text{deg}(g)$ over $\mathbb{F}_2[x]$. — Tom Chen, Mar 09 '23 at 03:17
@TomChen No it's not. If $k$ is odd, then the RHS contributes a $x^{2k}$ term. Whereas deg $H(g)$ is just $k+1$. — Calvin Lin, Mar 09 '23 at 14:02

score -2 · Answer 2 · answered Mar 06 '23 at 04:02

Here is a generating function approach that works: Let $f_r(x)$ be the generating function for win-loss records after $r$ rounds. Then: \begin{align*} f_0(x) &= 1\\ f_1(x) &= 1 + x\\ f_r(x) &= f_{r-1}(x)(1 + x)(1 + x^2)\ldots(1 + x^{r-1}) \end{align*} The coefficient of $x^k$ in $f_r(x)$ is the number of teams with $k$ wins after $r$ rounds. This works because in each round, teams are paired up with others of the same win-loss record (when possible), and the team with more wins always wins. So you're really just multiplying the generating function by $(1+x)$ shifted versions of itself. From this you can extract explicit formulas for the number of teams with $k$ wins after $r$ rounds by expanding the generating function and collecting coefficients.

Here are some additional thoughts:

The generating function approach gives a compact formula, but evaluating it to get explicit terms quickly becomes tedious. An alternative is to think about the process directly:

After $r$ rounds, consider teams with $k$ wins. These came from teams with $k-1$ wins that won their match, and teams with $k$ wins that lost their match (if possible). So you have a recurrence like: \begin{align*} a_{r,k} &= a_{r-1,k-1} + a_{r-1,k}\\ a_{r,0} &= a_{r-1,1}\\ a_{r,T/2} &= a_{r-1,T/2-1} \end{align*} where $a_{r,k}$ is the number of teams with $k$ wins after round $r$. This is easier to evaluate directly, and generalizes easily to other numbers of teams/wins.

The long-term behavior also has a nice explanation: after enough rounds, almost all matches are between teams with nearly equal win counts (close to $T/2$). In that case, wins and losses essentially balance out, and you just see an oscillation between the two closest win counts to $T/2$. The period of this oscillation depends on the parity of $T$ (mod 8) because that determines how the win counts are arranged around $T/2$.

Those recurrence relations are very helpful. A few more thoughts:

The binary representation of $T$ isn't too useful here, since the recurrences relate sequences for $4k+2$ and $4k+4$ to those for $4k$. The powers of 4 are really what matter, not the individual bits. You can extend the recurrences to get formulas for arbitrary $T$: \begin{align*} O(T) &= O((T-2)/4) + O(2) + (T \mod 4 = 2)O(2)\\ U(T) &= U((T-4)/4) + U(2) + (T \mod 4 = 2)(U(2) + U(2)) \end{align*} These let you calculate the sequences for any $T$, not just powers of 4. The long-term behavior is similar, but the period doubles from 4 to 8 when switching from overdog to underdog. This is because in the underdog case, teams with win counts close to $T/2$ can switch between the two closest counts after either winning or losing, so you get oscillations at half the rate.

Thanks for your answer, this formula looks promising, but I think it might need to be modified. How could we modify this so that for $T=10$ we obtain $f_0(x) = 10$ and $f_1(x) = 5 + 5x$ and so on? It's also possible that I'm misinterpreting the formula — Jackson, Mar 06 '23 at 18:56
This does not make sense to me. What we have is $ f_n(x) \approx f_{n-1} (x) ( 1+x) / 2$. However, because we need to deal with the coefficients that are odd, it is not an exact equality. — Calvin Lin, Mar 07 '23 at 21:44

Explicit formula for tournament sequence

2 Answers2