8

For a random variable $X$, the cumulant generating function $CGF_X$ is defined as $CGF_X(t)=\log Ee^{tX}$, and the nth cumulant $k_n(X)$ is defined as the coefficient of $t^n/n!$ in the corresponding power series. The cumulant $k_n$ has the following properties:

  1. $k_n(X+Y)=k_n(X)+k_n(Y)$ if X and Y are independent (additivity)
  2. $k_n(cX)=c^nk_n(X)$ for any scalar $c$ (homogeneity)
  3. $k_n(X)=p_n(EX, EX^2,\dots, EX^n)$ where $p_n$ is a universal polynomial (i.e. does not depend on X)

Now suppose I have some other function $k'$ that satisfies properties 1-3. Is k' necessarily a scalar multiple of $k_n$?

Motivation: The higher order cumulants can be somewhat mysterious-having a characterization like the above would make them seem much more natural. Alternatively, it would be interesting if there are other invariant polynomials besides the cumulants.

Simon Segert
  • 5,819
  • 11
  • 22

3 Answers3

6

Let $x_i=\mathbb E[X^i]$ and $y_i=\mathbb E[Y^i]$. We have that $$\mathbb E[(X+Y)^i]=\sum_h\binom ih x_{i-h}y_h$$ and define the Hurwitz product $*:\mathbb R^n\times\mathbb R^n\to\mathbb R^n$ by $$x*y=\left(\sum_h\binom ih x_{i-h}y_h\right)_{i\in [n]}$$ so that the additivity condition is implied by $p_n(x*y)=p_n(x)+p_n(y)$ and the homogeneity condition by $p_n(cx_1,\dots,c^nx_n)=c^np(x_1,\dots,x_n)$, which must hold for all vectors $x,y$ whose entries are the moments of a random variable. The existence theorem for the moment problem says that there exists such a random variable for the vector $(m_i)_i$ if and only if the Hankel matrices $(H_n)_{1\leq i,j\leq n}=m_{i+j}$ are positive semi-definite. The moments of an exponential distribution with parameter $\lambda=1$ are $m_i=i!$, so the associated Hankel matrices have strictly positive determinant. By continuity of the determinant there's a ball around $v=(1!, \dots, n!)$ such that all of these determinants are positive, so by Sylvester's criterion the associated Hankel matrices are positive semi-definite, so there exists a random variables with these moments. Therefore the polynomials $p_n(x*y)$ and $p_n(x)+p_n(y)$ coincide in an open ball $B_\epsilon (v,v)\subset\mathbb R^{2n}$, so they're equal.

Let $\lambda=p_n(0,\dots,0,1)$ and I'll prove that this implies $p_n=\lambda\overline B_n$ where $\overline B_n$ is a logarithmic Bell polynomial, defined by the relationship of exponential generating functions $$\log\sum_n\frac{x_n z^n}{n!}=\sum_n\frac{\overline B_n(x_1,\dots,x_n)z^n}{n!}.$$ These polynomials give the cumulants as a function of the moments, but they're rarely given names in the literature, and they satisfy $$\overline B_n(x_1,\dots,x_n)=\sum_{k=1}^n (-1)^{k-1}(k-1)!B_{n,k}(x_1, \dots, x_{n-k+1}).$$

the operation $*$ gives $\mathbb R^n$ the structure of an abelian Lie group, and it is isomorphic to the usual euclidean space via \begin{align*} \varphi:\mathbb R^n&\to \mathbb R^n\\ (x_1, \dots,x_n)&\mapsto\left(h![t^h]\prod_{k=1}^n (1+t^k)^{x_k}\right)_{h\in [n]}. \end{align*}

I.e. $\varphi(x+y)=\varphi(x)*\varphi(y)$. The composition $\overline B_n\circ\varphi$ is linear: let $y=\varphi(x)$ and we can compute \begin{align*} \sum_h \frac{y_h}{h!}z^h&=\prod_{k}(1+z^k)^{x_k}+O(z^{n+1})\\ \log\sum_h \frac{y_h}{h!}z^h&=\sum_k x_k\log(1+z^k)+O(z^{n+1})\\ \sum_h \frac{\overline B_h(y)}{h!}z^h&=\sum_k x_k\log(1+z^k)+O(z^{n+1})\\ \overline B_n(y)=\overline B_n\varphi(x)&=n![z^n]\sum_k x_k\log(1+z^k)\\ &=n!\sum_{d|n}x_d[z^n]\log(1+z^d)\\ &=(n-1)!\sum_{d|n}(-1)^{n/d+1}d x_d. \end{align*}

Call this linear functional $L=\overline B_n\circ\varphi$.

By hypothesis, we have that $$p_n\circ\varphi(x)+p_n\circ\varphi(y)=p_n(\varphi(x)*\varphi(y)) =p_n\circ\varphi(x+y).$$ So $p_n\circ\varphi$ is a continuous function which preserves addition, so it must be linear. We can find out what it is by computing its differential at the origin. By the homogeneity condition, $p_n$ is a linear combination of the monomials $x_1^{i_1}\dots x_n^{i_n}$ where $i_1+2i_2+\dots ni_n=n$. Only one of these has degree one and using the normalization we have $p_n(x)=\lambda x_n+O(|x|^2)$ near the origin which implies $(dp_n)_0=[0,\dots,0,\lambda]$. To compute the differential (Jacobian) of $\varphi$ we consider the generating function that defines it, and compute \begin{align*} \frac{\partial }{\partial x_i}\prod_{k=1}^n (1+t^k)^{x_k}&=\log(1+t^i)\prod_{k=1}^n (1+t^k)^{x_k}\\ \frac{\partial }{\partial x_i}\bigg\rvert_{x=0}\prod_{k=1}^n (1+t^k)^{x_k}&=\log(1+t^i). \end{align*} When we multiply by $(dp_n)_0$ all but the last column are going to vanish, so we compute \begin{align*} \frac{d\varphi_n}{dx_i}\bigg\rvert_{x=0}&=n![t^n]\frac{\partial }{\partial x_i}\bigg\rvert_{x=0}\prod_{k=1}^n(1+t^k)^{x_k}\\ &=n![t^n]\log(1+t^i)\\ &=\begin{cases} n!\frac{(-1)^{n/i+!}}{n/i} & i|n\\ 0 & i\nmid n \end{cases} \end{align*} which implies that $(dp_n\circ\varphi)_0=(d\varphi)_0\circ (dp_n)_0=\lambda L$.

Thus $p_n\circ\varphi=\lambda L=\lambda\overline B_n\circ\varphi$, and since $\varphi$ is a bijection we can apply $\varphi^{-1}$ on the right and obtain $p_n=\lambda\overline B_n$.

We can avoid using the result that $\varphi$ is a bijection by noticing that $(d\varphi)_0$ is a triangular matrix whose diagonal entries are not zero, so it is invertible. Thus $\varphi$ is invertible in a neighborhood of the origin, thus $p_n-\lambda\overline B_n$ is a polynomial vanishing in a small ball centered at the origin, so it is zero.

In "Twelve problems in probability no one likes to bring up", Rota credits the solution of this problem to Thiele:

Sometime in the last [19th] century, the Danish statistician Thiele observed that thevariance of a random variable, namely $\text{Var}(X)=\mathbb E[X^2]-\mathbb E[X]^2$, possesses notable properties, to wit:

  1. It is invariant under translation: $\text{Var}(X+c)=\text{Var}(X)$ for any constant $c$.
  2. If $X$ and $Y$ are independent random variables, then $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$.
  3. $\text{Var}(X)$ is a polynomial in the moments of the random variable $X$.

He then proceeded to determine all nonlinear functionals of a random variable which have the same properties.

I wasn't able to find the original reference from Thiele.

Derivative
  • 1,980
  • great answer! going to have to take some time to digest this but at first glance it seems convincing. interesting about the $\varphi$: the argument hinges on the invertibility of this mapping, which, going by the paper you linked, seems like a pretty non-trivial result – Simon Segert Feb 27 '25 at 16:33
  • A few remarks: (a) how do you know that $p_n(0,…,0,1)\neq 0$ (and thus that you can normalize to 1)? Although I actually think this is moot, since if $p_n(0,…,0,1)=0$, then the Jacobian at the origin will vanish; and thus $p_n\circ\varphi=0$. So this corresponds to $p_n=0$ which is indeed a valid solution to the axioms. (b) I also think you don't even need the isomorphism result in the paper you linked, which can make the argument a bit more self-contained. (cont.) – Simon Segert Feb 27 '25 at 19:57
  • (2/2) By your computation, the Jacobian of $\varphi$ at the origin is (if I'm not mistaken) non-zero only for indices where $i|j$. This matrix is triangular with nonzero entries along the diagonal; thus invertible. So $Im(\varphi)$ contains a neighborhood locally isomorphic to $R^n$. Now, by your argument, the zero set of $p_n-\overline{B_n}$ contains $Im(\varphi)$. An algebraic variety either has positive comdimension or is equal to the entire space; thus the zero set is $R^n$. – Simon Segert Feb 27 '25 at 19:57
  • 1
    @SimonSegert I convinced myself that $p_n(0,\dots,0,1)\neq 0$ by a recursive computation on the coefficients, but what you said gave me the idea of an easier way to do it. I've edited it into the answer. – Derivative Feb 27 '25 at 21:44
  • another question: how do we know that $p_n(xy)=p_n(x)+p_n(y)$ for arbitrary* x and y? Strictly speaking, my conditions (1)+(3) imply this only when there is a random variable $X$ such that $x_i=EX^i$ and similarly for y. But there are vectors that cannot be realized as moment sequences, e.g. (1,0). – Simon Segert Feb 27 '25 at 22:09
6

inspired by derivative's answer above, I did some digging and found the paper Lutz Mattner - What Are Cumulants, which proves a very similar characterization of the cumulants to what i asked. More precisely, the paper consider the space $Prob_r$ of probability measures whose first $r$ moments are finite. Corollary 1.12 in that paper states that any additive, continuous function on $Prob_r$ must be a linear combination of $\kappa_1,\dots, \kappa_r$. So if we also require homogeneity, then it easily follows that the only solutions are scalar multiples of $\kappa_r$.

The main difference between Mattner's result and my axioms is that I assumed that $k'$ can depend on the distribution only through the moments (and only polynomially). Whereas he allows for any continuous functional on the space of measures (and uses a particular weighted variation distance to topologize the space). So his result implies an affirmative answer to my question.

That said, Derivative's answer above is much shorter than Mattner's argument, and the result is "almost" as strong. (I think Derivative's argument basically goes through if one only assume $p_n$ to be continuous, rather than polynomial).

Royi
  • 10,050
Simon Segert
  • 5,819
  • 11
  • 22
  • I think it is better to have a search friendly link. One day the link won't work. Yet the description will always be searchable. – Royi Mar 01 '25 at 09:10
2

Here is another solution.

Step 1. We recall @Derivative's nice observation:

Observation Let $n \geq 1$ and

$$ \Omega = \{ (\mathbb{E}[X], \ldots, \mathbb{E}[X^n]) : X \in L^n(\mathbb{P})\}. $$

Then $\Omega$ contains an open subset of $\mathbb{R}^n$. In particular, if $p, q \in \mathbb{R}[x_1, \ldots, x_n]$ agree on $\Omega$, then $p = q$.

Step 2. Next, for each $X \in L^n(\mathbb{P})$, let $X_1, X_2, \ldots$ be IID copies of $X$. Also, let $(N_t)_{t\geq 0}$ be a Poisson point process of unit rate, and let $Y_t = \sum_{n=1}^{N_t} X_n$ be the corresponding compound Poisson process. We make some observations:

  1. When $X$ has finite MGF, we have $\log \mathbb{E}[e^{sY_t}] = t(\mathbb{E}[e^{sX}] - 1)$. From this, we immediately obtain $$\kappa_n(Y_t) = t \mathbb{E}[X^n]. \tag{1}$$ This is initially proved when $X$ is assumed to have MGF, but a truncation argument shows that this continues to hold whenever $X \in L^n(\mathbb{P})$.

  2. Conversely, invoking the Faà di Bruno's formula to $\mathbb{E}[e^{sY_t}] = \exp(t(\mathbb{E}[e^{sX}] - 1))$, we have $$ \begin{align*} \mathbb{E}[Y_t^n] &= \sum_{\pi \in \Pi(n)} t^{|\pi|} \prod_{B \in \pi} \mathbb{E}[X^{|B|}] \\ &= t \mathbb{E}[X^n] + \cdots + t^n \mathbb{E}[X]^n, \tag{2} \end{align*}$$ where $\pi$ runs through the set $\Pi(n)$ of all partitions of the set $\{1, \ldots, n\}$, and $B\in \pi$ means the variable $B$ runs through the list of all of the "blocks" of the partition $\pi$. We also note that the lowest-order term in $(2)$ is $t \mathbb{E}[X^n]$, which will be used crucially later.

As a corollary of these two observations, we know that

Lemma. For each $t > 0$, the values of $(\mathbb{E}[Y_t], \ldots, \mathbb{E}[Y_t^n])$ uniquely determines the values of $(\mathbb{E}[X], \ldots, \mathbb{E}[X^n])$ and vice versa.

Step 3. Continuing, let $p(x_1, \ldots, x_n)$ be a polynomial such that

$$\tilde{\kappa}(X) = p(\mathbb{E}[X], \ldots, \mathbb{E}[X^n])$$

satisfies both additivity and homogeneity as listed in OP.

  1. We have $p(tx_1, t^2x_2, \ldots, t^nx_n) = t^n p(x_1, \ldots, x_n)$ for all $t$ and $x_1, \ldots, x_n$, since this holds true when $(x_1, \ldots, x_n)$ are replaced by the sequence of moments. Consequently, $p$ must be of the form $$ \begin{align*} p(x_1, \ldots, x_n) &= \sum_{\substack{\alpha_1, \ldots, \alpha_n \in \mathbb{Z}_{\geq 0} \\ \alpha_1 + 2\alpha_2 + \cdots + n\alpha_n = n}} c_{\alpha} x_1^{\alpha_1} \cdots x_n^{\alpha_n} \tag{3} \end{align*} $$ for some constants $c_{\alpha}$'s. In particular,

    • $x_n$ can only occur in $(3)$ when $\alpha = (0, \ldots, 0, 1)$, and
    • all the other terms in $(3)$ are products of at least two (possibly non-distinct) $x_i$'s with $i < n$.
  2. $(Y_t)_{t\geq 0}$ has independent and stationary increments. Consequently, for any $t, s \geq 0$, $$ \tilde{\kappa}(Y_t) + \tilde{\kappa}(Y_s) = \tilde{\kappa}(Y_{t+s}). $$ Since the map $t \mapsto \tilde{\kappa}(Y_t)$ is continuous, it follows that $\tilde{\kappa}(Y_t) = c t$ for some $c$ that depends only on $X$.

  3. Now here is a crux of the argument: Combining $(2)$ and $(3)$, it follows that $$ \begin{align*} \tilde{\kappa}(Y_t) &= c_{0,\ldots,0,1} \mathbb{E}[Y_t^n] + \mathcal{O}(t^2) \\ &= c_{0,\ldots,0,1} t \mathbb{E}[X^n] + \mathcal{O}(t^2). \end{align*} $$ By the previous observation, all the higher-order terms must vanish, yielding $$ \tilde{\kappa}(Y_t) = c_{0,\ldots,0,1} t \mathbb{E}[X^n] = c_{0,\ldots,0,1} \kappa_n(Y_t). $$ So by the observation and lemma, it follows that $\tilde{\kappa}$ is a constant multiple of $\kappa_n$ (and the same is true for the corresponding polynomials).

Sangchul Lee
  • 181,930