1

I'm trying to see if I can get the probability of a rider finishing in specific positions in a race, based on probabilities of finishing 1st, top 3 and top 10.

As an example I could have the probabilities below:

  • Rider A:
  • Win prob: 0.22
  • Top 3 prob: 0.55
  • Top 10 prob: 0.73

And:

  • Rider B:
  • Win prob: 0.07
  • Top 3 prob: 0.35
  • Top 10 prob: 0.70

The total number of riders will be somewhere between $100$ and $180$ (I know the exact number for each race of course, and have checked that Σ(P(1)) = 1, Σ(P(top3)) = 3 and Σ(P(top10)) = 10 ).

For calculating the probability of a rider finishing 2nd i currently use:

P(2) = P(1) / P(top3) / (1/3) * 0.5 * (P(top3) - P(1))

For rider A this would be equal to 0.20, and thus give a probability of finishing exactly 3rd of

P(3) = P(top3) - P(1) - P(2) = 0.55 - 0.22 - 0.20 = 0.13

but I'm unsure if there is a better method. And the method is for sure too inaccurate in calculating other probabilities.

Ideally, I want a table of probabilities, for every rider, of finishing in every specific position between 1 and 15, based on win prob., top 3 prob. and top 10 prob.

I found this thread: Given every horse's chance of winning a race, what is the probability that a specific horse will finish in nth place? that discusses a method to calculate the chance of finishing second based on the probability of finishing first. Using this I get P(2) = 0.18 for rider A.

But I am unsure if I can use this method to calculate P(4), P(5) etc. The relation between P(1), P(top3) and P(top10) is not always the same so i prefer a method that could take this into account. Specially for calculating the higher finish positions.

Edit: I always know the chance of every rider finishing first and in the top 3. In most cases i know the chance of every rider finishing in the top 10 aswell. These values are known from Betting odds, corrected so the total sum of probabilities adds up.

I'm not expecting an exact solution, but an estimate that i can build on. I will check the predictions up against actual results afterwards to check how well it corresponds with reality.

NicoB
  • 11
  • What do you mean by"Σ(P(1)) = 1, Σ(P(top3)) = 3 and Σ(P(top10)) = 10"? –  May 30 '22 at 13:42
  • It's not very clear to me what you know and what you want to know. Do you know P1, P3 and P10 for everyone of the (say) n=100 riders? That would be 3n numbers. Or do you only know this for two of the riders - so 6 numbers? Perhaps it would help if you explained how you know these things - are you trying to backsolve from betting odds for instance? In any case, I expect you will either have too much or too little data for an exact solution. You will probably need to build a model to estimate performance and then estimate the parameters of the model based on the information you have. (out of spa – Blitzer May 30 '22 at 13:49
  • Just that i checked the sum of probabilites of all the riders so they add up, i.e the sum of win probabilities for all riders is 1.00, the sum of top 3 finish probabilities for all riders is 3.00 and the sum of top 10 finish probabilities add up to 10.00 – NicoB May 30 '22 at 13:49
  • @Blitzer Sorry about that, added that as an edit. I always know the probability of every rider to finish 1st and the probability of every rider to finish in the top 3 aswell. And i do realize that this will be an estimated solution, but that is not a problem. I will check the model afterwards to see how well it fits with the actual outcomes. And Correct, the win_probability and top 3_probability is known from betting odds. – NicoB May 30 '22 at 13:54
  • How about you say that the time taken by rider $i$ is $T_i\sim N(\mu_i, \sigma_i^2)$. Pop it into some sort of Monte Carlo estimator and you get your parameters and then can get whatever statistics you need. I think I used https://en.wikipedia.org/wiki/Stochastic_gradient_descent to do something like this once - although I'm a bit blurry on the details. – Blitzer May 30 '22 at 14:07
  • I was hoping to be able to start out with something more simple to test out, but i think this could be a very good solution. I'll see what i can figure out, thanks. – NicoB May 30 '22 at 17:58

1 Answers1

1

Here's a simpler approach than the one we discussed in the comments.

Let $P_i$ be the random variable representing the position of rider $i$. Clearly the $P_i$ are not independent but let's just pretend that they are. Assume that $P_i$ is distributed with the negative binomial distribution with parameters $p,r$. Why? Well, it's discrete and it has two parameters which feels about right. We can fit the best negative binomial distribution we can to the 2 or 3 data points you have for each rider; it's then trivial to calculate the probability of any particular outcome for that rider. Repeat this for every rider. Finally, you'll need to tweak or scale the probabilities so that the total probability that somebody finishes in each position is 1.

This won't be perfect but will perhaps give you a simple and effective route to get what you are looking for.

Blitzer
  • 2,210