4

Let $X_1,...,X_n \sim N(u; \sigma)$ Let $\tau$ be the .98 percentile, i.e. $P(X < \tau )$ = 0.98:

  1. Find the MLE of $\tau$:
  2. Find an expression for an approximate $1 - \alpha$ confidence interval for $\tau$.

My Attempt is as follows:

We know the MLE of mean and variance of a normal distribution is given by, $\displaystyle \mu_{mle}=\bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i$ and $\displaystyle \sigma^2_{mle}=\frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2$

$\displaystyle P(X < \tau)=P(Z < \frac{\tau - \mu}{\sigma})=0.98$

From inverse normal distribution tables we know that when $\displaystyle P(Z < z)=0.98$ then $z=2.053749$.

i.e. $\displaystyle \frac{\tau - \mu}{\sigma}=2.053749$ or $\tau=2.053749 \sigma + \mu$.

Plugging in MLE values of $\mu, \sigma$, we have $\displaystyle \tau_{mle}= \bar{X} + 2.053749 \sqrt{\frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2}$

We know that for a single sample, $\displaystyle P(X < \tau)=\frac{1}{\sigma \sqrt{2 \pi}} \int_{-\infty}^{\tau} e^{-\frac{(x - \mu)^2}{2 \sigma^2}} dx$

Therefore, $\displaystyle f(X;\tau)= \frac{\partial}{\partial \tau} \bigg(\frac{1}{\sigma \sqrt{2 \pi}} \int_{-\infty}^{\tau} e^{-\frac{(x - \mu)^2}{2 \sigma^2}} dx\bigg)=\frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{(\tau - \mu)^2}{2 \sigma^2}}$.

The Fisher information of a single sample (from Wikipedia) is given by,

$\displaystyle E\bigg[\bigg(\frac{\partial}{\partial \tau} \log f(X; \tau)\bigg)^2 \bigg| \tau\bigg]$

Plug in $\displaystyle f(X;\tau)$ from above, we get,

$\displaystyle E\bigg[\bigg(\frac{\partial}{\partial \tau} \log \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{(\tau - \mu)^2}{2 \sigma^2}} \bigg)^2 \bigg| \tau \bigg] = E\bigg[ \frac{(\tau - \mu)^2}{\sigma^4}\bigg| \tau \bigg] = \frac{(\tau - \mu)^2}{\sigma^4}$

Hence, $\displaystyle I_n(\tau)= n\frac{(\tau-\mu)^2}{\sigma^4}$ which gives s.e.=$\displaystyle \sqrt{\frac{1}{I_n(\tau)}}=\sqrt{\frac{\sigma^4}{n(\tau - \mu)^2}}$

But from previous exercise, we know that, $\displaystyle \frac{\tau - \mu}{\sigma}=2.053749$, and taking a reciprocal of its square we get $0.237086$

Therefore, CI is $\displaystyle \tau_{mle} \pm Z_{\alpha/2} \frac{\sigma_{mle}}{\sqrt{0.237086n}}$

Alternative solution

We also know that Fisher information is the negative of expectation of second derivative as follows

$\displaystyle -E\bigg[\frac{\partial^2}{\partial \tau^2} \log f(X; \tau) \bigg| \tau\bigg]=E_{\tau}\bigg[\frac{1}{\sigma^2} \bigg| \tau \bigg]$

Hence $\displaystyle I_n(\tau) = \frac{n}{\sigma^2}$ which gives s.e.=$\displaystyle \sqrt{\frac{1}{I_n(\tau)}}=\frac{\sigma}{\sqrt{n}}$

Therefore, CI is $\displaystyle \tau_{mle} \pm Z_{\alpha/2} \frac{\sigma_{mle}}{\sqrt{n}}$

Why are the two results different?

pavybez
  • 103
  • Don't you have a rather foundational problem in that $f$ is not only a function of $\tau$? – Ian Jun 20 '21 at 22:52
  • This used to confuse me so much before but the data came from a random distribution and the parameters did not. If $\tau$ is a function of the parameters, then, by definition, the MLE of $\tau$ is that function evaluated at the MLE of the actual parameters. – William M. Jun 20 '21 at 22:53
  • Then, you will find that $\hat \tau = a \bar x + b s,$ where $\bar{x}$ is the sample mean and $s$ is the sample variance. For the normal distribution, the sample mean and sample variance are independent random variables! The sample mean is N($\mu$, $\sigma^2/n$) and the sample variance is $\chi^2_{n-1}.$ – William M. Jun 20 '21 at 22:57
  • So, you could calculate the density of the sum $a \bar{x} + b s$ or you can use simulation. If I were you, I'd use bootstrap if the sample size was already a couple hundres (likely). – William M. Jun 20 '21 at 22:58
  • If $n$ is very large, you can even use that $\chi_{n-1}^2$ is the sum of $n-1$ independent random variables with mean $1$ and variance $2$ and use CLT. – William M. Jun 20 '21 at 22:59
  • @Ian that is not a problem, if $\mu$ and $\sigma^2$ are unknown, there are confidence intervals for each of them – Vons Jun 21 '21 at 01:37
  • I'm referring to symbols like $f(X;\tau)$ and differentiating that. If you just define the MLE of a function of the parameters to be that function of the MLEs of the parameters then that's fine. – Ian Jun 21 '21 at 02:19

1 Answers1

1

The MLE of $\tau$ is correct by the invariance property of MLE's.

The confidence interval of a quantile of the normal distribution is:

$$\left(\bar X - t_{1-\frac\alpha 2, [n-1, -\sqrt n Z]}\frac{s}{\sqrt n}, \bar X - t_{\frac\alpha 2, [n-1, -\sqrt n Z]}\frac{s}{\sqrt n}\right)$$

with $s=\sqrt{\frac{\sum_{i=1}^n (X_i-\bar X)^2}{n}}$ and $t_{\dots}$ the such and such quantile of a non-central t-distribution. So the $1-\alpha$ confidence interval for the 98th percentile is

$$\left(\bar X - t_{1-\frac\alpha 2, [n-1, -\sqrt n 2.05]}\frac{s}{\sqrt n}, \bar X - t_{\frac\alpha 2, [n-1, -\sqrt n 2.05]}\frac{s}{\sqrt n}\right)$$

with 2.05 obtained from $\Phi^{-1}(.98)$ with $\Phi^{-1}$ the quantile function of a standard normal distribution.

See Technical Details section of this paper and The parametric CI method (in the Gaussian case) of this paper (both results/formulas are the same).

Vons
  • 11,285
  • Thanks but my question is why do the two fisher information procedures produce different results? – pavybez Jun 21 '21 at 03:24
  • @pavybez I think theres an issue with your procedure; if both $\mu$ and $\sigma^2$ are unknown it doesn't make sense to have fisher information in $\tau$ [single parameter] – Vons Jun 21 '21 at 04:33