I'm interested in estimating Shannon's entropy for a discrete multivariate random variable $X$ that has high dimensionality (i.e. $X=(X1,..,Xn)$, where n is at the hundreds).
I can effectively sample from $X$, as well as evaluate $P(X=x)$. But due to the high dimensionality, I cannot enumerate all possible values to calculate the entropy.
A simple, straightforward Monte Carlo estimator seems to be:
Sample a large number ($m$) of observations $\{o_1,..,o_m\}$ from $X$ and calculate the sample average of their negative log-probability: $\hat{H}(X)=-\frac{1}{M}\sum_i{\log{p(o_i)}}$.
This approximates $H(X)=-\sum_{x\in X}{p(x)\log{p(x)}}$ which requires summation over all possible values of $X$.
- Is this a correct approach? Am I missing something?
- Is this estimator biased? and if it is, why?
- If this is an unbiased estimator, then what is the issue that is addressed by entropy estimation methods such as those mentioned in this question? Does the bias arise in the setting where we can't evaluate $\log{p(x)}$ and must resort to noisy occurrence counts or histograms?