1

I am trying to estimate the accuracy of a set of Monte Carlo simulation where my result is

\begin{equation} C=1-\frac{P_X(1)}{(P_Y(1))^2} \end{equation}

and $X$ and $Y$ are the results of two separate experiments, so they are independent. In particular, both simulations only outputs zeros or ones so what I am measuring is the probability of getting a 1 in each of the two indipendently. To give an example, it's as if as you have an unfair coin and you are trying to determine its balancing by throwing it many times.

I already know that the conjugate prior of $X$ and $Y$ is a beta distribution. I am trying to compute how accurate my results can be given a sample of size $N$, and I would be satisfied if I could find analytically the variance $\text{Var}(C)$.

I started by simplifying a bit and assume beta symmetric, which yields $\text{Var}(X) = \frac{\mu (1-\mu)}{1 + N_r}$ where $\mu$ is mean and $N_r$ is the size of the sample. Similarly, $\text{Var}(Y^2)$ itself shouldn't be a problem. If it was by itself, considering that the sample size is large, I could approximate it with a Gaussian and compute the variance of its square as in Mean and variance of Squared Gaussian: $Y=X^2$ where: $X\sim\mathcal{N}(0,\sigma^2)$? .

However, I already know the variance and average of the inverse of a normal do not exist, so how do you compute the variance on this ratio? Also, if you think that trying to compute the variance is not the right approach, feel free to suggest something better.

Enrico
  • 137
  • Rather than "the number of samples" you should say "the size of the sample". – Michael Hardy Nov 23 '20 at 17:20
  • Some things seem unclear in your question. Do you assume $X$ and $Y$ are independent? Do they both have the same distribution? You wrote "the simulation only outputs zeros or ones". What does that have to do with $X$ and $Y$? Are they supposed to be probabilities of getting a $1$? Or is one of them supposed to be such a probability? And what does the other one of them have to do with it? – Michael Hardy Nov 23 '20 at 17:27
  • I will edit the question for more clarity. Thank you for the remarks – Enrico Nov 23 '20 at 17:39
  • It appears that your beta distribution is $$\text{constant} \cdot x^{\mu N_r-1} (1-x)^{(1-\mu)N_r-1}, dx \quad \text{for } 0<x<1.$$ – Michael Hardy Nov 23 '20 at 17:45
  • @MichaelHardyYes, that is the distribution that would be representative of $X$. And for that one, the Variance is known. But I am a bit stuck with the next step, which is to take the ration and find the variance – Enrico Nov 23 '20 at 17:56
  • Your present version of the question does not say what $X$ and $Y$ have to do with the sequence of $0$s and $1$s. My present guess is that $C$ is the probability of getting a $1,$ but that's only a guess. Why don't you fully state the problem? – Michael Hardy Nov 23 '20 at 18:17
  • But my guess is wrong unless $C$ is between $0$ and $1.$ Do we have some reason to think that? – Michael Hardy Nov 23 '20 at 18:19
  • So I think the issue could be with my sloppy notation, maybe you can help me make more clear. Let's consider one simulation for the moment. Say that the simulation is just the flipping on an unfair coin. Every time you query it as an output you get a 0 or 1, in other words, a head or a tail. If you query enough times you can estimate the probability of a 1 for that specific coin by taking the number of times you got 1 divided by the total number of queries (or sample size).

    In the same exact way but for anther coin you measure P(1). That would be simulation Y.

    – Enrico Nov 23 '20 at 18:53
  • Finally, C would be the 1 minus the ratio of the probability of 1 of the two experiments. Maybe I should write $C= 1- P_x(1)/ P_y (1)$. Where $ P_x(1)$ is the probability of getting a 1 in the experiment $X$. I have to admit the notation is very confusing – Enrico Nov 23 '20 at 18:54
  • Are you saying $Y$ is the probability of getting a $1$ from the second coin? Is $X$ the probability of getting a $1$ with the first coin? – Michael Hardy Nov 24 '20 at 01:50
  • 1
    Yes, I am measuring C= 1 - Probability of 1 in first coin/(Probability of 1 in the second coin)^2 – Enrico Nov 24 '20 at 09:04
  • One point about notation: In the expression $P_X(x),$ there is a reason why a capital letter is used in the subscript and a lower-case letter as the argument. You have $P_X(x) = \Pr(X=x),$ where capital $X$ is the random variable and lower-case $x$ is any number that might be realized. – Michael Hardy Nov 24 '20 at 20:13
  • So you have $X,Y$ are independent, each distributed as $\text{constant} \cdot x^{\mu N_r-1} (1-x)^{(1-\mu)N_r-1}, dx \quad \text{for } 0<x<1,$ and the conditional probability, given $X,$ that the first coin gives you a "head" is $X$, and $Y$ for the second coin, and you want a confidence interval for $1 - X/Y^2. \qquad$ – Michael Hardy Nov 24 '20 at 20:19
  • I think it would make more sense to use a posterior probability interval, and I suspect that is what you meant. – Michael Hardy Nov 24 '20 at 20:20
  • Now that we've cleared that up, I just noticed something else. Suppose your prior is a beta distribution, so it is $$ \text{constant}\cdot x^{\alpha-1} (1-x)^{\beta-1} , dx \quad \text{for } 0<x<1. $$ You seem to have concluded that $\alpha+\beta = N_r.$ But I would expect the posterior distribution, but not the prior distribution, to depend on $N_r.$ Are you confusing those with each other? – Michael Hardy Nov 24 '20 at 20:37
  • I probably did confuse them. Essentially, I derived that the beta is representative of my test by using Bayes. In particular, if you try to compute the probability distribution, given a set of data you end up with the prior (which I set to a constant) times the evidence (which is the binomial distribution) divided by the probability of the data. So, this calculation yields a Beta. As you correctly point out I should call it posterior, since is something I infer from the data. As a prior for Bayes, I just took a constant, because before the experiment there is no preferential outcome. – Enrico Nov 25 '20 at 08:27
  • @MichaelHardy now that we've cleared things up, do you know what could be the way to proceed here? – Enrico Nov 26 '20 at 16:26

1 Answers1

0

If I correctly understand the things posted in comments, you have $X,Y\sim\text{i.i.d.} \operatorname{Uniform}(0,1)$ and two sequences of Bernoulli random variables, both conditionally i.i.d. given $X,Y,$ with respective probabilities $X$ and $Y$ of being equal to $1.$

The conditional distributions of $X,Y$ given the numbers of $1\text{s}$ and $0\text{s}$ for both of those sequences are given by \begin{align} & \Pr(X\in A\mid\alpha,\beta,\gamma,\delta) = \int_A \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha-1} (1-x)^{\beta-1}\, dx \text{ for } A \subseteq[0,1] \\[8pt] & \Pr(Y\in A\mid\alpha,\beta,\gamma,\delta) = \int_A \frac{\Gamma(\gamma+\delta)}{\Gamma(\gamma)\Gamma(\delta)} x^{\gamma-1} (1-x)^{\delta-1}\, dx \text{ for } A \subseteq[0,1] \\[6pt] & \text{where } \alpha,\beta \text{ are the respective numbers of 1s and 0s from the first} \\ & \text{sequence and } \delta,\gamma \text{ from the second.} \end{align} (This result is central to the famous posthumous paper of Thomas Bayes published in 1763.)

As an estimate of $C = 1 - \dfrac X {Y^2}$ my first thought is \begin{align} & \operatorname E(C\mid\alpha,\beta,\gamma,\delta) \\[8pt] = {} & \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \cdot \frac{\Gamma(\delta+\gamma)}{\Gamma(\delta)\Gamma(\gamma)} \iint\limits_{[0,1]^2} \left( 1 - \frac x{y^2} \right) x^{\alpha-1} (1-x)^{\beta-1} y^{\gamma-1} (1 - y)^{\delta -1} \, d(x,y) \\[8pt] = {} & 1 - \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \cdot \frac{\Gamma(\delta+\gamma)}{\Gamma(\delta)\Gamma(\gamma)} \cdot \iint\limits_{[0,1]^2} x^\alpha (1-x)^{\beta-1} y^{\gamma-3} (1 - y)^{\delta -1} \, d(x,y) \\[8pt] = {} & 1 - \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \cdot \frac{\Gamma(\delta+\gamma)}{\Gamma(\delta)\Gamma(\gamma)} \left( \frac{\Gamma(\alpha+1)\Gamma(\beta)}{\Gamma(\alpha+1+\beta)} \cdot\frac{\Gamma(\gamma-2) \Gamma(\delta)}{\Gamma(\gamma-2+\delta)} \right) \\[8pt] = {} & 1 - \frac{(\gamma+\delta-2)(\gamma+\delta-1) \alpha }{(\alpha+\beta)(\gamma-2)(\gamma-1)}. \end{align}

  • Thank you for your help so far. There is something I haven't understood about your answer: given the way you write $ \Pr(X\in A\mid\alpha,\beta,\gamma,\delta)$ isn't it always equal to 1? As far as I have understood, $ \int_A x^{\alpha-1} (1-x)^{\beta-1}, dx= \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} \text{ for } A \subseteq[0,1]$, which means your expression cancels out to 1. – Enrico Nov 27 '20 at 08:23
  • @Enrico : The capital $A$ denotes an arbitrary (measurable) subset of $[0,1],$ not the whole interval. For example, if $A = [0.1,0.5]$ then $\int_A$ is $\int_{0.1}^{0.5},$ etc. Where I wrote $\text{“for $A\subseteq[0,1]$”},$ did $\text{“}{\subseteq}\text{”}$ look to you like $\text{“}{=}\text{”?} \qquad$ – Michael Hardy Nov 28 '20 at 02:19
  • I see, now it makes sense indeed. I'm sorry I overlooked that – Enrico Nov 28 '20 at 08:38
  • If it doesn't take too much of your time, may I ask you why you did set up an integral that way for $E(C)$? My intuition is that this is in a way similar to a weighted sum, where the relationship we established between the variables $(1-x/y^2)$ gives you the "weight". But surely you have a much more rigorous explanation in mind. I am asking because I don't want to copy-paste this answer in the work I am doing but it's necessary that I also understand fully. Now that we have the expectation, do you have in mind how to get the variance? Should I expand, and find $\text{E} [X^2]-\text{E}[X]^{2}$? – Enrico Nov 28 '20 at 08:53
  • @Enrico : Actually, one could just as reasonably write $$ \begin{align} & \operatorname E(C\mid\alpha,\beta,\gamma,\delta) = \operatorname E\left( 1 - \frac X{Y^2} ,\Big|, \alpha,\beta,\gamma,\delta \right) \ {} \ = {} & 1 - \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \cdot \frac{\Gamma(\delta+\gamma)}{\Gamma(\delta)\Gamma(\gamma)} \iint\limits_{[0,1]^2} \frac x{y^2} \cdot x^{\alpha-1} (1-x)^{\beta-1} y^{\gamma-1} (1 - y)^{\delta -1} , d(x,y) \ & \text{and so on.} \end{align} $$ – Michael Hardy Nov 28 '20 at 20:49
  • Now, I guess you would get $E[C]^2$ directly by taking \begin{equation} E[C]^2=\left(1-\frac{(\gamma+\delta-2)(\gamma+\delta-1) \alpha }{(\alpha+\beta)(\gamma-2)(\gamma-1)} \right)^2 \end{equation} Whereas: \begin{equation} E[C^2]=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \cdot \frac{\Gamma(\delta+\gamma)}{\Gamma(\delta)\Gamma(\gamma)} \iint\limits_{[0,1]^2} \left( 1 - \frac x{y^2} \right)^2 x^{\alpha-1} (1-x)^{\beta-1} y^{\gamma-1} (1 - y)^{\delta -1} , d(x,y) \end{equation} work them out and then take the difference? – Enrico Nov 29 '20 at 12:05
  • @Enrico : Yes, that would give you the variance. – Michael Hardy Nov 29 '20 at 18:31
  • Thank you, I will work that out. Actually, my test case is a bit more complex that this question but I decided to start with a simpler question. In reality, instead of $P_X(1)$ I have a linear combination of different measures $c_1P_{X,1}(1)+c_2P_{X,2}(1)...c_nP_{X,n}(1)$. When I posed the question I thought generalization was going to be easy but it does not seem as trivial now. Essentially, to use your approach I guess I have to find the distribution that is representative of that sum and plug that one into the integral, whereas $(1-\frac{x}{y^2})$ stays the same. – Enrico Nov 30 '20 at 09:37
  • If you have any suggestions to that generalization I'd be grateful. I am looking for a general approach for $n$ terms, so I can't just write out the sum explicitly but has to work for any $n$ where $n<20$ is small compared to the sample size of each individual experiments – Enrico Nov 30 '20 at 10:03