1

Lets assume the multinomial distribution

$\prod_{j=0}^{K-1} p_j^{c_j}$

where $\sum_{j=0}^{K-1} c_j = n$ and $\sum_{j=0}^{K-1} p_j = 1$.

I am interested in the relation between the expected number of different outcomes, lets call it $\bar{k}_{diff}$, and the entropy $H = -\sum_{j=0}^{K-1} p_j\log_2{p_j}$. Clearly, there is some relation between those two quantities. E.g.,
1) as $H\rightarrow 0$, $\bar{k}_{diff} \rightarrow 1$.
2) And if $H \rightarrow log_2K$ is maximal (thus, $p_j = \frac{1}{K}$ for all $j$), then $\bar{k}_{diff} \rightarrow \min \{K,n\}$.

Is there a way to upper (and maybe lower) bound $\bar{k}_{diff}$ by H?

My first attempt went like this: Assume that $K<n$ and that $X_j$ is the indicator random variable which outputs 1 if the $j$-th element appeared in the chain of $n$ outcomes. Then $k_{diff} = \sum_{j=0}^{K-1} X_j$ and the probability of $P(X_j = 1) = 1- (1-p_j)^n$. Hence

$\bar{k}_{diff} = \mathbb{E}[k_{diff}] = \sum_{j=0}^{K-1} P(X_j = 1) = \sum_{j=0}^{K-1} 1- (1-p_j)^n = 1 + (K-1) - \sum_{j=0}^{K-1}(1-p_j)^n < 1 + (K-1) - \sum_{j=0}^{K-1}(1-p_j)^K < 1 + (K-1) - (K-1)\sum_{j=0}^{K-1}p_j^K = 1 + (K-1)^2 H_T^K < 1 + (K-1)^2 H$.

The last inequality holds because Tsallis $K$-th order entropy $H_T^K = \frac{1}{(K-1)}(1 - \sum_{j=0}^{K-1}p_j^K$) is smaller than Shannons entropy, whenever $K>1$. However I am not very satisfied with this result for the following reasons:

1) I am not entirely sure that the inequality $\sum_{j=0}^{K-1}(1-p_j)^K > (K-1)\sum_{j=0}^{K-1}p_j^K$ holds (My intuition tels me so, but I could be of course wrong).
2) If 1) is true, then the above upper bound only holds for $K<n$ (would love to have an upper bound that holds for any pair of n,K numbers).
3) This upper bound is clearly too high, since as H goes to K, the upper bound goes to $K^3$ (instead of K as it should).

Any suggestions on how to approach this problem are very much appreciated!

Simon
  • 21

0 Answers0