Numerical Analysis - Brute Force (Optional)Numpy
I wrote a program in Python to numerically calculate the expected probability of 99% seen cumulatively. It seems the answer to Question 1 for 99% cumulative for select 5 out of 100 is 181 rounds empirically for 10,000 trials so it's high certainty.
Question 1: Part1 Answer is Round 181 - Round 185
Percentage(Round number) = PGF
Cumulative(Round number) = CDF
PGF and CDF of Rounds taken for 100 total choices and window size 5
Choosing 5 choices every round till all 100 choices are seen
053 rounds: 001 times, Percentage:000.0% Cumulative:00.01%
054 rounds: 003 times, Percentage:000.0% Cumulative:00.04%
055 rounds: 003 times, Percentage:000.0% Cumulative:00.07%
056 rounds: 001 times, Percentage:000.0% Cumulative:00.08%
057 rounds: 005 times, Percentage:000.1% Cumulative:00.13%
058 rounds: 004 times, Percentage:000.0% Cumulative:00.17%
059 rounds: 010 times, Percentage:000.1% Cumulative:00.27%
060 rounds: 010 times, Percentage:000.1% Cumulative:00.37%
061 rounds: 017 times, Percentage:000.2% Cumulative:00.54%
062 rounds: 014 times, Percentage:000.1% Cumulative:00.68%
063 rounds: 022 times, Percentage:000.2% Cumulative:000.9%
064 rounds: 020 times, Percentage:000.2% Cumulative:001.1%
065 rounds: 031 times, Percentage:000.3% Cumulative:01.41%
066 rounds: 035 times, Percentage:000.3% Cumulative:01.76%
067 rounds: 050 times, Percentage:000.5% Cumulative:02.26%
...
072 rounds: 095 times, Percentage:000.9% Cumulative:05.62%
073 rounds: 096 times, Percentage:001.0% Cumulative:06.58%
074 rounds: 109 times, Percentage:001.1% Cumulative:07.67%
075 rounds: 091 times, Percentage:000.9% Cumulative:08.58%
076 rounds: 117 times, Percentage:001.2% Cumulative:09.75%
...
098 rounds: 190 times, Percentage:001.9% Cumulative:47.12%
099 rounds: 154 times, Percentage:001.5% Cumulative:48.66%
100 rounds: 166 times, Percentage:001.7% Cumulative:50.32%
101 rounds: 199 times, Percentage:002.0% Cumulative:52.31%
...
136 rounds: 048 times, Percentage:000.5% Cumulative:90.07%
137 rounds: 052 times, Percentage:000.5% Cumulative:90.59%
138 rounds: 033 times, Percentage:000.3% Cumulative:90.92%
139 rounds: 052 times, Percentage:000.5% Cumulative:91.44%
140 rounds: 055 times, Percentage:000.6% Cumulative:91.99%
141 rounds: 044 times, Percentage:000.4% Cumulative:92.43%
142 rounds: 030 times, Percentage:000.3% Cumulative:92.73%
143 rounds: 036 times, Percentage:000.4% Cumulative:93.09%
144 rounds: 037 times, Percentage:000.4% Cumulative:93.46%
145 rounds: 025 times, Percentage:000.2% Cumulative:93.71%
146 rounds: 029 times, Percentage:000.3% Cumulative:094.0%
147 rounds: 027 times, Percentage:000.3% Cumulative:94.27%
148 rounds: 031 times, Percentage:000.3% Cumulative:94.58%
149 rounds: 033 times, Percentage:000.3% Cumulative:94.91%
150 rounds: 019 times, Percentage:000.2% Cumulative:095.1%
151 rounds: 023 times, Percentage:000.2% Cumulative:95.33%
152 rounds: 026 times, Percentage:000.3% Cumulative:95.59%
153 rounds: 025 times, Percentage:000.2% Cumulative:95.84%
154 rounds: 018 times, Percentage:000.2% Cumulative:96.02%
155 rounds: 013 times, Percentage:000.1% Cumulative:96.15%
156 rounds: 014 times, Percentage:000.1% Cumulative:96.29%
157 rounds: 016 times, Percentage:000.2% Cumulative:96.45%
158 rounds: 021 times, Percentage:000.2% Cumulative:96.66%
159 rounds: 019 times, Percentage:000.2% Cumulative:96.85%
160 rounds: 007 times, Percentage:000.1% Cumulative:96.92%
161 rounds: 021 times, Percentage:000.2% Cumulative:97.13%
162 rounds: 008 times, Percentage:000.1% Cumulative:97.21%
163 rounds: 017 times, Percentage:000.2% Cumulative:97.38%
164 rounds: 017 times, Percentage:000.2% Cumulative:97.55%
165 rounds: 018 times, Percentage:000.2% Cumulative:97.73%
166 rounds: 011 times, Percentage:000.1% Cumulative:97.84%
167 rounds: 012 times, Percentage:000.1% Cumulative:97.96%
168 rounds: 009 times, Percentage:000.1% Cumulative:98.05%
169 rounds: 011 times, Percentage:000.1% Cumulative:98.16%
170 rounds: 014 times, Percentage:000.1% Cumulative:098.3%
171 rounds: 008 times, Percentage:000.1% Cumulative:98.38%
172 rounds: 009 times, Percentage:000.1% Cumulative:98.47%
173 rounds: 010 times, Percentage:000.1% Cumulative:98.57%
174 rounds: 011 times, Percentage:000.1% Cumulative:98.68%
175 rounds: 008 times, Percentage:000.1% Cumulative:98.76%
176 rounds: 005 times, Percentage:000.1% Cumulative:98.81%
177 rounds: 005 times, Percentage:000.1% Cumulative:98.86%
178 rounds: 004 times, Percentage:000.0% Cumulative:098.9%
179 rounds: 005 times, Percentage:000.1% Cumulative:98.95%
180 rounds: 004 times, Percentage:000.0% Cumulative:98.99%
181 rounds: 006 times, Percentage:000.1% Cumulative:99.05%
182 rounds: 004 times, Percentage:000.0% Cumulative:99.09%
...
248 rounds: 001 times, Percentage:000.0% Cumulative:99.97%
252 rounds: 001 times, Percentage:000.0% Cumulative:99.98%
254 rounds: 001 times, Percentage:000.0% Cumulative:99.99%
267 rounds: 001 times, Percentage:000.0% Cumulative:100.0%
181 rounds: 006 times, Percentage:000.1% Cumulative:99.05%
You can adjust the values, so instead of 100 choices let's say there were only 10 choices then it would be
Question 2: Part 1 Answer for 99% certainty for 10 choices is Round 14.
PGF and CDF of Rounds taken for 10 total choices and window size 5
Choosing 5 choices every round till all 10 choices are seen
3 rounds: 45 times, Percentage:4.5% Cumulative:4.5%
4 rounds: 167 times, Percentage:16.7% Cumulative:21.2%
5 rounds: 220 times, Percentage:22.0% Cumulative:43.2%
6 rounds: 210 times, Percentage:21.0% Cumulative:64.2%
7 rounds: 136 times, Percentage:13.6% Cumulative:77.8%
8 rounds: 80 times, Percentage:8.0% Cumulative:85.8%
9 rounds: 48 times, Percentage:4.8% Cumulative:90.6%
10 rounds: 39 times, Percentage:3.9% Cumulative:94.5%
11 rounds: 21 times, Percentage:2.1% Cumulative:96.6%
12 rounds: 10 times, Percentage:1.0% Cumulative:97.6%
13 rounds: 12 times, Percentage:1.2% Cumulative:98.8%
14 rounds: 4 times, Percentage:0.4% Cumulative:99.2%
15 rounds: 2 times, Percentage:0.2% Cumulative:99.4%
16 rounds: 3 times, Percentage:0.3% Cumulative:99.7%
Here is the Python code
import numpy as np
seen = []
history = []
total_choices = 100
window_size = 5
def choose():
return list(np.random.choice(total_choices, window_size))
def keep_rolling():
global seen
global history
new_seen = choose()
seen = list(set([new_seen, seen]))
history = [*history, new_seen]
def print_rounds():
tweak this while loop to total_choices*99/100 or any optimization number
while(len(seen) != total_choices):
keep_rolling()
#print(str(len(history)) + " rounds taken")
#for x in history:
print(x)
return len(history)
trials = 10000
rounds = []
cumulative = 0
for i in range(trials):
rounds_taken = print_rounds()
rounds = [*rounds, rounds_taken]
seen = []
history = []
print(f"Choosing {window_size} choices every round till all {total_choices} choices are seen")
for x in list(sorted(list(set(rounds)))):
count = len([y for y in rounds if y==x])
percentage = count*100/trials
cumulative = cumulative + percentage
print(f"{str(x).zfill(3)} rounds: {str(count).zfill(3)} times, Percentage:{str(round(percentage, 1)).zfill(5)}% Cumulative:{str(round(cumulative, 2)).zfill(5)}%")
Each simulation takes around 5 seconds, and running it multiple times gives almost the same answers so the expected number of rounds is solved with no problem in variance.
At max instead of Round 181 it might be +- 4 rounds which is the Confidence Interval/Variance
So to be on the safe side we can say Round 185 for 100 coupons, 5 draws per round.
If you calculate it theoretically, you'll get a smaller exact range such as 183 rounds +- 2 rounds for example to Question 1. Basically the distribution has a long tail that goes to infinity that we can cut and measure the area of by integration or binomial theorem for that last 1% probability.
Here are some solutions for Question 2 by changing the total choices from 100 to some other number
Choosing 5 choices every round till all 6 choices are seen
008 rounds: 063 times, Percentage:000.6% Cumulative:099.6%
Choosing 5 choices every round till all 7 choices are seen
009 rounds: 066 times, Percentage:000.7% Cumulative:99.26%
Choosing 5 choices every round till all 10 choices are seen
14 rounds: 4 times, Percentage:0.4% Cumulative:99.2%
Choosing 5 choices every round till all 20 choices are seen
030 rounds: 026 times, Percentage:000.3% Cumulative:99.05%
Choosing 5 choices every round till all 25 choices are seen
039 rounds: 017 times, Percentage:000.2% Cumulative:99.08%
Choosing 5 choices every round till all 30 choices are seen
047 rounds: 020 times, Percentage:000.2% Cumulative:99.08%
Choosing 5 choices every round till all 50 choices are seen
084 rounds: 010 times, Percentage:000.1% Cumulative:99.04%
If I run the simulation for 100 $\to$ 5 coupons 10 times it gives
Variance calculation/estimation
Choosing 5 choices every round till all 100 choices are seen
184 rounds: 010 times, Percentage:000.1% Cumulative:99.04%
Choosing 5 choices every round till all 100 choices are seen
181 rounds: 004 times, Percentage:000.0% Cumulative:099.0%
Choosing 5 choices every round till all 100 choices are seen
183 rounds: 006 times, Percentage:000.1% Cumulative:99.02%
Choosing 5 choices every round till all 100 choices are seen
184 rounds: 004 times, Percentage:000.0% Cumulative:99.02%
Choosing 5 choices every round till all 100 choices are seen
185 rounds: 004 times, Percentage:000.0% Cumulative:99.02%
Choosing 5 choices every round till all 100 choices are seen
185 rounds: 009 times, Percentage:000.1% Cumulative:99.04%
Choosing 5 choices every round till all 100 choices are seen
184 rounds: 006 times, Percentage:000.1% Cumulative:099.0%
Choosing 5 choices every round till all 100 choices are seen
184 rounds: 003 times, Percentage:000.0% Cumulative:99.02%
Choosing 5 choices every round till all 100 choices are seen
183 rounds: 004 times, Percentage:000.0% Cumulative:99.02%
Choosing 5 choices every round till all 100 choices are seen
184 rounds: 006 times, Percentage:000.1% Cumulative:99.04%
Question 1 Part 2 - Question 1 Completed
If the question starts at seen = "some coupons"? (k>n)
Lets say we have already seen 32 of the 100 total coupons and we are going to choose 5 tickets and we need to know how many rounds it will likely(99%) take?
That is the same as running a calculation of total coupons = 100 - 32 = 68 total coupons with a window size of 5 and finding the row where the CDF crosses 99%.
So if it took 51 rounds to see 32 coupons
Choosing 5 choices every round till all 68 choices are seen
119 rounds: 003 times, Percentage:000.0% Cumulative:99.01%
So it will take another 119 rounds to see the remaining 68 choices.
So the total rounds it took was 51 + 119 = 170 total rounds to see all the 100 coupons.
But if we say that it might take more as the 32 coupons can also be seen again - then it again goes back to the total coupons 100 and window size 5 calculation but with the starting point at 32 seen.
So we change the Python code starting lines as follows
already_seen = 32
print(f"Already seen {already_seen}")
seen = [x for x in range(already_seen)]
history = []
total_choices = 100
window_size = 5
...
replace seen = []
seen = [x for x in range(already_seen)]
And we get
Already seen 32
Choosing 5 choices every round till all 100 choices are seen
045 rounds: 001 times, Percentage:000.0% Cumulative:00.01%
046 rounds: 001 times, Percentage:000.0% Cumulative:00.02%
047 rounds: 003 times, Percentage:000.0% Cumulative:00.05%
048 rounds: 002 times, Percentage:000.0% Cumulative:00.07%
049 rounds: 003 times, Percentage:000.0% Cumulative:000.1%
...
126 rounds: 051 times, Percentage:000.5% Cumulative:89.05%
127 rounds: 041 times, Percentage:000.4% Cumulative:89.46%
128 rounds: 051 times, Percentage:000.5% Cumulative:89.97%
129 rounds: 047 times, Percentage:000.5% Cumulative:90.44%
130 rounds: 039 times, Percentage:000.4% Cumulative:90.83%
131 rounds: 043 times, Percentage:000.4% Cumulative:91.26%
132 rounds: 042 times, Percentage:000.4% Cumulative:91.68%
133 rounds: 034 times, Percentage:000.3% Cumulative:92.02%
134 rounds: 041 times, Percentage:000.4% Cumulative:92.43%
135 rounds: 037 times, Percentage:000.4% Cumulative:092.8%
136 rounds: 027 times, Percentage:000.3% Cumulative:93.07%
137 rounds: 033 times, Percentage:000.3% Cumulative:093.4%
138 rounds: 042 times, Percentage:000.4% Cumulative:93.82%
139 rounds: 030 times, Percentage:000.3% Cumulative:94.12%
140 rounds: 032 times, Percentage:000.3% Cumulative:94.44%
141 rounds: 028 times, Percentage:000.3% Cumulative:94.72%
142 rounds: 016 times, Percentage:000.2% Cumulative:94.88%
143 rounds: 025 times, Percentage:000.2% Cumulative:95.13%
144 rounds: 023 times, Percentage:000.2% Cumulative:95.36%
145 rounds: 022 times, Percentage:000.2% Cumulative:95.58%
146 rounds: 016 times, Percentage:000.2% Cumulative:95.74%
147 rounds: 010 times, Percentage:000.1% Cumulative:95.84%
148 rounds: 023 times, Percentage:000.2% Cumulative:96.07%
149 rounds: 015 times, Percentage:000.1% Cumulative:96.22%
150 rounds: 012 times, Percentage:000.1% Cumulative:96.34%
151 rounds: 008 times, Percentage:000.1% Cumulative:96.42%
152 rounds: 018 times, Percentage:000.2% Cumulative:096.6%
153 rounds: 021 times, Percentage:000.2% Cumulative:96.81%
154 rounds: 017 times, Percentage:000.2% Cumulative:96.98%
155 rounds: 012 times, Percentage:000.1% Cumulative:097.1%
156 rounds: 019 times, Percentage:000.2% Cumulative:97.29%
157 rounds: 013 times, Percentage:000.1% Cumulative:97.42%
158 rounds: 014 times, Percentage:000.1% Cumulative:97.56%
159 rounds: 010 times, Percentage:000.1% Cumulative:97.66%
160 rounds: 008 times, Percentage:000.1% Cumulative:97.74%
161 rounds: 009 times, Percentage:000.1% Cumulative:97.83%
162 rounds: 011 times, Percentage:000.1% Cumulative:97.94%
163 rounds: 007 times, Percentage:000.1% Cumulative:98.01%
164 rounds: 006 times, Percentage:000.1% Cumulative:98.07%
165 rounds: 011 times, Percentage:000.1% Cumulative:98.18%
166 rounds: 007 times, Percentage:000.1% Cumulative:98.25%
167 rounds: 018 times, Percentage:000.2% Cumulative:98.43%
168 rounds: 005 times, Percentage:000.1% Cumulative:98.48%
169 rounds: 009 times, Percentage:000.1% Cumulative:98.57%
170 rounds: 013 times, Percentage:000.1% Cumulative:098.7%
171 rounds: 006 times, Percentage:000.1% Cumulative:98.76%
172 rounds: 003 times, Percentage:000.0% Cumulative:98.79%
173 rounds: 013 times, Percentage:000.1% Cumulative:98.92%
174 rounds: 003 times, Percentage:000.0% Cumulative:98.95%
175 rounds: 005 times, Percentage:000.1% Cumulative:099.0%
177 rounds: 004 times, Percentage:000.0% Cumulative:99.04%
178 rounds: 002 times, Percentage:000.0% Cumulative:99.06%
179 rounds: 002 times, Percentage:000.0% Cumulative:99.08%
180 rounds: 006 times, Percentage:000.1% Cumulative:99.14%
181 rounds: 004 times, Percentage:000.0% Cumulative:99.18%
182 rounds: 004 times, Percentage:000.0% Cumulative:99.22%
183 rounds: 004 times, Percentage:000.0% Cumulative:99.26%
184 rounds: 005 times, Percentage:000.1% Cumulative:99.31%
185 rounds: 003 times, Percentage:000.0% Cumulative:99.34%
186 rounds: 003 times, Percentage:000.0% Cumulative:99.37%
187 rounds: 002 times, Percentage:000.0% Cumulative:99.39%
188 rounds: 002 times, Percentage:000.0% Cumulative:99.41%
189 rounds: 003 times, Percentage:000.0% Cumulative:99.44%
190 rounds: 001 times, Percentage:000.0% Cumulative:99.45%
191 rounds: 003 times, Percentage:000.0% Cumulative:99.48%
192 rounds: 004 times, Percentage:000.0% Cumulative:99.52%
194 rounds: 001 times, Percentage:000.0% Cumulative:99.53%
195 rounds: 001 times, Percentage:000.0% Cumulative:99.54%
196 rounds: 002 times, Percentage:000.0% Cumulative:99.56%
197 rounds: 004 times, Percentage:000.0% Cumulative:099.6%
198 rounds: 004 times, Percentage:000.0% Cumulative:99.64%
199 rounds: 005 times, Percentage:000.1% Cumulative:99.69%
200 rounds: 002 times, Percentage:000.0% Cumulative:99.71%
201 rounds: 001 times, Percentage:000.0% Cumulative:99.72%
203 rounds: 001 times, Percentage:000.0% Cumulative:99.73%
205 rounds: 001 times, Percentage:000.0% Cumulative:99.74%
207 rounds: 004 times, Percentage:000.0% Cumulative:99.78%
208 rounds: 003 times, Percentage:000.0% Cumulative:99.81%
209 rounds: 001 times, Percentage:000.0% Cumulative:99.82%
210 rounds: 001 times, Percentage:000.0% Cumulative:99.83%
212 rounds: 002 times, Percentage:000.0% Cumulative:99.85%
213 rounds: 002 times, Percentage:000.0% Cumulative:99.87%
214 rounds: 001 times, Percentage:000.0% Cumulative:99.88%
217 rounds: 001 times, Percentage:000.0% Cumulative:99.89%
219 rounds: 001 times, Percentage:000.0% Cumulative:099.9%
220 rounds: 001 times, Percentage:000.0% Cumulative:99.91%
221 rounds: 001 times, Percentage:000.0% Cumulative:99.92%
225 rounds: 001 times, Percentage:000.0% Cumulative:99.93%
226 rounds: 001 times, Percentage:000.0% Cumulative:99.94%
229 rounds: 001 times, Percentage:000.0% Cumulative:99.95%
235 rounds: 001 times, Percentage:000.0% Cumulative:99.96%
237 rounds: 001 times, Percentage:000.0% Cumulative:99.97%
244 rounds: 001 times, Percentage:000.0% Cumulative:99.98%
252 rounds: 001 times, Percentage:000.0% Cumulative:99.99%
279 rounds: 001 times, Percentage:000.0% Cumulative:100.0%
175 rounds: 005 times, Percentage:000.1% Cumulative:099.0%
So it took an extra 175 rounds in addition to the rounds it took to see the 32 coupons initially.
Question 2 Part 2 Completed
If it is not known that there are 100 coupons
Somehow you would have to give information on how many rounds we are allowed to use to estimate the total unknown number of coupons and the uncertainty in our estimate of this unknown total number. If there is no limit we can simply run many trials until we stop seeing new numbers - the stopping condition will be influenced by 3 things -
- how many total trials we ran.
- how many trials since we last saw a new number.
- how many new numbers have we seen in total.
This would also require some numerical calculation.
Already seen 32
100 rounds, unknown 303 choices, observed coupons seen
100 rounds, 0032.0 coupons seen, Actual coupon count: 032
100 rounds, 0033.0 coupons seen, Actual coupon count: 033
100 rounds, 0034.0 coupons seen, Actual coupon count: 034
100 rounds, 0035.0 coupons seen, Actual coupon count: 035
100 rounds, 0036.0 coupons seen, Actual coupon count: 036
100 rounds, 0037.0 coupons seen, Actual coupon count: 037
100 rounds, 0038.0 coupons seen, Actual coupon count: 038
100 rounds, 0039.0 coupons seen, Actual coupon count: 039
100 rounds, 0040.0 coupons seen, Actual coupon count: 040
100 rounds, 0041.0 coupons seen, Actual coupon count: 041
100 rounds, 0042.0 coupons seen, Actual coupon count: 042
100 rounds, 0043.0 coupons seen, Actual coupon count: 043
...
100 rounds, 095.67 coupons seen, Actual coupon count: 096
100 rounds, 096.57 coupons seen, Actual coupon count: 097
100 rounds, 097.61 coupons seen, Actual coupon count: 098
100 rounds, 098.65 coupons seen, Actual coupon count: 099
100 rounds, 099.38 coupons seen, Actual coupon count: 100
100 rounds, 100.68 coupons seen, Actual coupon count: 101
100 rounds, 101.33 coupons seen, Actual coupon count: 102
100 rounds, 102.16 coupons seen, Actual coupon count: 103
100 rounds, 103.38 coupons seen, Actual coupon count: 104
...
100 rounds, 241.13 coupons seen, Actual coupon count: 287
100 rounds, 242.92 coupons seen, Actual coupon count: 288
100 rounds, 242.84 coupons seen, Actual coupon count: 289
100 rounds, 244.23 coupons seen, Actual coupon count: 290
100 rounds, 245.96 coupons seen, Actual coupon count: 291
100 rounds, 244.69 coupons seen, Actual coupon count: 292
100 rounds, 245.07 coupons seen, Actual coupon count: 293
100 rounds, 246.15 coupons seen, Actual coupon count: 294
100 rounds, 247.56 coupons seen, Actual coupon count: 295
100 rounds, 247.08 coupons seen, Actual coupon count: 296
100 rounds, 0247.0 coupons seen, Actual coupon count: 297
100 rounds, 249.77 coupons seen, Actual coupon count: 298
100 rounds, 249.12 coupons seen, Actual coupon count: 299
100 rounds, 248.28 coupons seen, Actual coupon count: 300
100 rounds, 250.57 coupons seen, Actual coupon count: 301
100 rounds, 250.46 coupons seen, Actual coupon count: 302
100 rounds, 252.53 coupons seen, Actual coupon count: 303
100 rounds, 250.06 coupons seen, Actual coupon count: 304
100 rounds, 253.05 coupons seen, Actual coupon count: 305
...
100 rounds, 0318.7 coupons seen, Actual coupon count: 475
100 rounds, 321.94 coupons seen, Actual coupon count: 476
100 rounds, 321.56 coupons seen, Actual coupon count: 477
100 rounds, 321.42 coupons seen, Actual coupon count: 478
100 rounds, 321.94 coupons seen, Actual coupon count: 479
100 rounds, 322.56 coupons seen, Actual coupon count: 480
100 rounds, 321.94 coupons seen, Actual coupon count: 481
100 rounds, 320.27 coupons seen, Actual coupon count: 482
...
100 rounds, 325.26 coupons seen, Actual coupon count: 495
100 rounds, 325.77 coupons seen, Actual coupon count: 496
100 rounds, 327.29 coupons seen, Actual coupon count: 497
100 rounds, 325.05 coupons seen, Actual coupon count: 498
100 rounds, 0328.0 coupons seen, Actual coupon count: 499
Coupons seen -> Probability through Maximum Likelihood Estimation of prior probabilities
Would need MLE to calculate the PDF and CDF for coupons seen $\to$ total coupons estimated.
Around 328 coupons seen in 100 rounds points to the total coupon count being around 500.
Around 100 coupons seen in 100 rounds points to the total coupon count being around 100.
Maximum possible seen coupons in 100 rounds = 500.
If we observed 500 new coupons for 100 rounds at 5 coupons per round then the expected total coupons is $\infty$
If 450 coupons are seen it means the estimated total coupons are around 1500.
If 400 coupons are seen it means the estimated total coupons are around 900.
It seems the expected total coupons keep doubling the closer we half the distance to reach to 500 maximum for 5 selected every round for 100 rounds.
Python code for APriory generation shown above.
import numpy as np
already_seen = 32
print(f"Already seen {already_seen}")
seen = [x for x in range(already_seen)]
window_size = 5
def choose(total_choices):
return list(np.random.choice(total_choices, window_size))
def keep_rolling(total_choices):
global seen
new_seen = choose(total_choices)
seen = list(set([seen, new_seen]))
def get_seen():
total_choices = already_seen + np.random.randint(468)
unknown total coupons, between 32 and 500
for x in range(100):
keep_rolling(total_choices)
return [len(seen), total_choices]
trials = 10000
seen_frequency = {}
cumulative = 0
for i in range(trials):
[seen_count, total_choices] = get_seen()
if total_choices in seen_frequency.keys():
seen_frequency[total_choices] = [*seen_frequency[total_choices], seen_count]
else:
seen_frequency[total_choices] = [seen_count]
seen = [x for x in range(already_seen)]
print(f"100 rounds, unknown {total_choices} choices, observed coupons seen")
for x in list(sorted(seen_frequency.keys())):
count = round(sum(seen_frequency[x]) / len(seen_frequency[x]), 2)
print(f"100 rounds, {str(count).zfill(6)} coupons seen, Actual coupon count: {str(x).zfill(3)}")
print("Coupons seen -> Probability through Maximum Likelihood Estimation of prior probabilities")
Points of Clarity from Questions asked
Is it good enough to have a Python program to plug in the values 5 and 100 to any values of your choosing?
It would be much simpler to start with 5 of 6, and 5 of 20, than 5 of 100. Is there any significance to
- 99% certainty
- 100 total choices
- Window size of 5
- the variable "n" was used twice, it needs to be unique in question. probably need to fix the definition of "n", "m", "k", "j", "x" and the other variables properly
- your question implies that you want to know when you've seen 99% of the choices, any reason why we would stop at having seen 99% of the total choices? The second question implies calculation for having seen 99% of the total choices 1-100 or 99th percentile in CDF?
- assuming you meant 99% of the total coupons seen, what confidence interval do you want for the 99% PDF? If not specified, it will default to 95% confidence interval over the 99% total coupons seen.