2

In a previous question (Calculating the Probability that 3 Random Numbers Sum to a Certain Number), I learned about the total number of ways that 3 random numbers between 1-100 can sum to 50.

But now I am interested in knowing if there is a way to estimate the "average number of times three random numbers (between 1-100) need to be generated before they sum to 50".

Conceptually, I know that someone could write a WHILE LOOP that attempts to estimate this number - but this could take a very long time. For example, here is some R code that can do this:

list_results <- list()

for (i in 1:100){ num_1_i = num_2_i = num_3_i = 0

sub_index <- 1 ## count it while(num_1_i + num_2_i + num_3_i != 50){ num_1_i = runif(1,0,100) num_2_i = runif(1,0,100) num_3_i = runif(1,0,100) sub_index <- sub_index + 1
}

inter_results_i <- data.frame(i, num_1_i, num_2_i, num_3_i, sub_index)
list_results[[i]] <- inter_results_i }

do.call(rbind, list_results)

I know that in Probability Theory, we can use Markov Chains to find out quantities such as the "Mean Hitting Time" which describe the number of transitions required on average before a certain sequence of state transitions is observed - but in this case, I have 100 states and it would take far too long to write out the transition matrix for this problem and then attempt to perform algebraic operations on this matrix.

Thus, in general (with Markov Chains or without Markov Chains) - how would one attempt to estimate the "average number of times three random numbers (between 1-100) need to be generated before they sum to 50"?

Thanks!

stats_noob
  • 4,107

2 Answers2

5

I don’t think you need any kind of Markov chain argument here. You’re essentially asking a question of the following form:

“On average, how many times must a roll an $n$ sided die before a 1 comes up”

Your die just happens to be very complicated with a lot of sides. You’re generating three random numbers, checking if they sum to $50$, and then throwing them out and generating new numbers if they fail to sum to the correct value. On each trial you’re generating new numbers independently. Let $S$ Be the sum you’re interested in. We’re interested in the random variable $X := \text{“the number of trials until we have S = 50” }$.

In the post you link we see that $\mathbb{P}[S = 50] = \frac{208}{171700}$. Thus we’re looking at a geometric random variable with parameter $p := \frac{208}{171700}$. NOTE: this number assumes you are really interested in unordered triples instead of ordered triples, as explained in the post you link.

Since $X$ is distributed as $\text{Geo}(p)$, we may immediately calculate:

$$\begin{align*} \mathbb{E}[X] & = \frac{1}{p} = \frac{171700}{208} \approx 825.5 \\ \text{median}(X) &= \text{Ceil} \left [ \frac{-1}{\log_2(1 - p)} \right ] = 572 \end{align*}$$

Joe
  • 2,963
  • @ Joe: thank you so much for your answer! Just to clarify, "ceil" = ceiling? (i.e. round the number up) – stats_noob Jul 31 '22 at 01:33
  • @stats_noob yes exactly – Joe Jul 31 '22 at 01:34
  • @stats_noob also note that I don’t think your code is doing what you think its doing. Based on your other question that you linked, you’re interested in unordered triples, not ordered triples. Your code is looking at ordered triples. I wrote my answer in a way that leaves this ambiguous, but the 208/171700 number is for unordered triples. – Joe Jul 31 '22 at 01:38
  • @ Joe: thanks again! I took a probability class where the prof gave us this question : how many times do you need to roll two dice before you observe (4,6) VS how many times do you need to roll two dice before you observe (6,6)? He told us that you need to use Markov Chains to solve this problem by finding out the "absorption time". Do you think the approach that you suggested in this answer can also be used to solve this problem? I think I still have the answer in my notes - it would be interesting to see if this approach provides the same answer as the Markov Chain approach! – stats_noob Jul 31 '22 at 01:38
  • 1
    I think you may be misremembering. You would be interested in using a Markov chain if you’re trying to calculate something like “how many times do you need to roll a die before you observe the sequence (4,6)”. This is a very different question from “how many times do you need to roll two dice before you observe the pair (4,6)”. – Joe Jul 31 '22 at 01:41
  • @ Joe: thank you for your reply! I am not sure if I can pick up on the difference between these two questions. Can you please explain it a bit more? Thank you! – stats_noob Jul 31 '22 at 01:44
  • Is it ok if we do this here so that other people can read this for reference? – stats_noob Jul 31 '22 at 01:46
  • I answered in the chat, and there is a link in the comments. Stack exchange frowns upon extended back-and-forths in the comments, and I intend on respecting the stylistic preferences of the site. – Joe Jul 31 '22 at 01:54
  • There are $(100)^3$ ways of choosing the $(3)$ numbers, with replacement. See my answer. – user2661923 Jul 31 '22 at 02:12
  • @user2661923 If you look at the post that stats_noob linked, it seems that they are actually interested in unordered triples. If they change their mind about that and are actually interested in all $100^3$ ordered triples, then the parameter should change to (49 choose 2)/($100^3$). In this case, your answer agrees with the expectation I gave, just without the reasoning of looking at a geometric random variable. – Joe Jul 31 '22 at 02:19
  • @ Joe: for the record - technically speaking, both dice questions can be answered with Markov Chains - correct? – stats_noob Jul 31 '22 at 02:32
  • @stats_noob This is correct. My comment about not needing to use Markov chains was there to point out that there is a less computationally costly (and in my opinion, conceptually easier) approach. That isn’t to say that you couldn’t solve these problems using Markov chains. It is also to address your professor’s comment that you need Markov chains one of the problems. – Joe Jul 31 '22 at 02:42
  • In the OP's code, in his posting, he is having each variable run from $(1)$ through $(100)$. Further, the $~\displaystyle \binom{49}{2}~$ computation only makes sense if the sampling is done with replacement. – user2661923 Jul 31 '22 at 03:04
  • @ Joe: thank you so much for your answer! If you have time, do you think you could please show me how the second question might be solved using markov chains? thank you so much! – stats_noob Jul 31 '22 at 04:24
1

Alternative approach:

You are sampling with replacement.

If you perform $~\displaystyle (100)^3~$ trials, you should expect $~\displaystyle \binom{49}{2}~$ successes.

So, you divide the number of trials by the number of successes to get the average number of trials per success,

$$\frac{(100)^3}{\binom{49}{2}}. \tag1 $$

The denominator of (1) above represents the number of distinct solutions to

  • $x_1 + x_2 + x_3 = 50 ~: ~x_1, x_2, x_3 \in \Bbb{Z^+}.$

  • $(x_1, x_2, x_3) = (48,1,1)$ is considered distinct from $(x_1, x_2, x_3) = (1,48,1)$.


For the alternative problem of sampling without replacement, the easiest approach is to take the computation in (1) above, and adjust the numerator and denominator.

The numerator will change to $(100 \times 99 \times 98).$ For the denominator:

  • It is impossible for any satisfying solution $x_1, x_2, x_3$ to have all three numbers identical, because $(50)$ is not a multiple of $(3)$.

  • There are $(3 \times 24)$ solutions where two of the three numbers were identical. These must be deducted from the denominator.

So, the revised computation is

$$\frac{100 \times 99 \times 98}{\binom{49}{2} - (3 \times 24)}.$$

user2661923
  • 42,303
  • 3
  • 21
  • 46