5

I am trying to understand the general logic on how to write the Likelihood Function of a Stochastic Process.

In some situations, I think I can understand how this is done. For example, suppose $X_t$ is a standard Wiener Process (https://en.wikipedia.org/wiki/Wiener_proces) with drift $\mu$ and diffusion $\sigma$:

$$ dX_t = \mu dt + \sigma dW_t $$

By definition, we know that $dW_t \sim N(0, dt)$. We also know that the difference between two terms within the Wiener Process are iid themselves. Suppose I had some real data and I believed it originated from a Wiener Process. I could first define a new random variable $Z$ based on differences of the Wiener Process at available times:

$$ Z = X_{t+1} - X_t $$ $$ X_{t+1} = X_t + \mu \Delta t + \sigma \sqrt{\Delta t} W $$

$$ Z = (X_t + \mu \Delta t + \sigma \sqrt{\Delta t} W) - X_t $$ $$ Z = \mu \Delta t + \sigma \sqrt{\Delta t} W $$

From here, we can see that $ Z \sim N(\mu \cdot \Delta t, \sigma^2 \Delta t)$. I should be able to write a Likelihood Function based on $Z$:

$$ L(\mu, \sigma^2 | Z_1, Z_2, ..., Z_n) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2\Delta t}} \exp\left(-\frac{(Z_i - \mu\Delta t)^2}{2\sigma^2\Delta t}\right) $$

I want to now repeat this logic and derive the likelihood function for a Continuous Time Birth-Death Stochastic Process (https://en.wikipedia.org/wiki/Birth%E2%80%93death_process)

Here, I attempted to mathematically derive a similar likelihood function for the discrete time case (for different scenarios): Estimating Population Growth with Limited Information. I also posted here Can an Infinite Matrix be Partitioned? and Relationship Between Poisson Process and Birth and Death Process, I tried to logic this out myself.

Let $X_t$ be the state of the system at time $t$, representing the number of individuals in the population. The Birth Rate: $\lambda_n$ (rate at which the population increases from $n$ to $n+1$) and Death rate:$ \mu_n$ (rate at which the population decreases from $n$ to $n-1$.

Suppose we observe the system at time points $t_0, t_1, \ldots, t_n$ with corresponding states $X_{t_0} = x_0, X_{t_1} = x_1, \ldots, X_{t_n} = x_n$

From here, I think we can define an infinite sized Generator Matrix:

$$Q_{ij} = \begin{cases} \lambda_i & \text{if } j = i + 1 \\ \mu_i & \text{if } j = i - 1 \\ -(\lambda_i + \mu_i) & \text{if } j = i \\ 0 & \text{otherwise} \end{cases}$$

In Continuous Time Stochastic Processes, we usually use the following equation to relate the Generator Matrix to the Transition Probability Matrix:

$$P(t) = e^{Qt}$$ $$P_{ij}(\Delta t) \approx \begin{cases} \lambda_i \Delta t & \text{if } j = i + 1 \\ \mu_i \Delta t & \text{if } j = i - 1 \\ 1 - (\lambda_i + \mu_i) \Delta t & \text{if } j = i \\ 0 & \text{otherwise} \end{cases}$$

Just as we transformed the above Wiener Process into a new iid random variable, I think we can also assume independence of the Birth Death Process over small time intervals and write the likelihood as a joint product of probabilities:

$$L(\{\lambda_i\}, \{\mu_i\} | x_0, x_1, \ldots, x_n) = \prod_{k=1}^{n} P_{x_{k-1}, x_k}(t_k - t_{k-1})$$

$$L(\{\lambda_i\}, \{\mu_i\} | x_0, x_1, \ldots, x_n) \approx \prod_{k=1}^{n} \begin{cases} \lambda_{x_{k-1}} \Delta t & \text{if } x_k = x_{k-1} + 1 \\ \mu_{x_{k-1}} \Delta t & \text{if } x_k = x_{k-1} - 1 \\ 1 - (\lambda_{x_{k-1}} + \mu_{x_{k-1}}) \Delta t & \text{if } x_k = x_{k-1} \end{cases}$$

$$L(\{\lambda_i\}, \{\mu_i\} | x_0, x_1, \ldots, x_n) = \prod_{k=1}^{n} P_{x_{k-1}, x_k}(t_k - t_{k-1})$$

Where $P_{x_{k-1}, x_k}(t_k - t_{k-1})$ are the exact transition probabilities from the generator matrix exponentiation $e^{Q(t_k - t_{k-1})}$

Can someone please show me how this might be extended to the Continuous Time case?

RobPratt
  • 50,938
konofoso
  • 681

1 Answers1

1

I don't think anything is wrong with discretizing the data-gathering process and finding the likelihood that way (exactly as you did above). However, it is an interesting mathematical question, so I'll describe how to generalize the likelihood function beyond random variable/vector data. This method does provide some nice insight into the second example you provided. You can actually compute a likelihood function assuming that you observe all the times in which the birth-death process jumps as well as the population after each jump.

Generalization of the Likelihood Function: Suppose you have a random quantity $X$ that is not necessarily a random variable or vector. In your question, you set $X$ to be a diffusion or a birth-death process. However, let's be general. Suppose you also have parameters $\theta$ (for diffusions, $\theta = (\mu,\sigma)$ and for birth-death processes $\theta = \{\lambda_i,\mu_i,i=1,2,\dots\}$).

In general, the likelihood function $L$ is what is known as a Radon-Nikodym derivative. Let $\mu_{\theta}$ be the distribution of $X$ with parameter $\theta$ and let $\nu$ be some measure such that $\mu_{\theta}$ is always absolutely continuous with respect to $\nu$. Then, $$L(\theta;x) = \frac{d\mu_{\theta}}{d\nu}(x).$$

To see how this works, suppose $X_1,X_2,\dots,X_n \sim \mathcal{N}(\mu,\sigma^2)$. Then $\mu_{(\mu,\sigma)}$ is the multivariate normal distribution with mean $\mu\mathbf{1}$ (where $\mathbf{1}$ is the $n$-dimensional vector of ones) and with covariance matrix $\sigma^2I_n$ (where $I_n$ is the $n\times n$ identity matrix). When we set $\nu$ to be the Lebesgue measure on $\mathbb{R}^n$, you can easily compute $$L(\mu,\sigma;x_1,\dots,x_n) = \prod_{i=1}^n\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x_i-\mu)^2}{2\sigma^2}\right),$$ which is precisely the likelihood you would expect. If your data is discrete, then you get the likelihood by setting $\nu$ to the measure such that $\nu(\{x\}) = 1$ for all possible observations $x$. You can likewise use this framework to construct likelihoods for mixed data.

Here's some rough intuition behind the definition. Let's suppose you observe the data $x$, but due to measurement error or some other reason the true value of the data can only be said to be approximately equal to $x$. Then for $\epsilon > 0$, you can define a set $A_{\epsilon}\ni x$ describing possible outcomes that are $\epsilon$-close to $x$. Assume $\mu_{\theta}(A_{\epsilon}) \to 0$ as $\epsilon\to 0$ for all values of $\theta$. Then (assuming standard regularity conditions hold), $$P_{\theta}\left(X \in A_{\epsilon}\right) = \int_{A_{\epsilon}}\,d\mu_{\theta} = \int_{A_{\epsilon}}\frac{d\mu_{\theta}}{d\nu}(y)\,d\nu(y) \approx \nu(A_{\epsilon})L(\theta;x).$$ So if $L(\theta_1;x) > L(\theta_2;x)$, this implies the probability that $X\approx x$ is larger under $\theta_1$ than it is under $\theta_2$. This motivates the idea that the MLE is a good estimator of the true $\theta$.

Birth-death processes: Based on the likelihood you provided, I'm assuming you allow the population to be negative. I briefly explain what to do below if you don't want to allow for a negative population. Assume we observe the times the birth-death process jumps up to time $T$ as well as its values after every jump (and at time $0$).

A useful value for $\nu$ in this case is the distribution of the birth-death process assuming $\lambda_i = \mu_i = 1$ for all $i$. Suppose that $x$ is one possible trajectory of the birth-death process whose jumps occur at times $0 < t_1 < \cdots < t_k < T$. For $i = 1,\dots,k$, let $y_j = x(t_j)-x(t_j-)$, so $y_j = 1$ if the population increases by $1$ and $-1$ if it decreases by $1$ at time $t_j$. Then you can write out the likelihood as follows: $$L(\{\lambda_i,\mu_i\};x) = \left(\prod_{\substack{j=1,\dots,k\\ y_j=1}} \lambda_{x(t_j-)}\prod_{\substack{j=1,\dots,k\\ y_j=-1}} \mu_{x(t_j-)}\right)\exp\left(-\int_0^T \mu_{x(s)}+\lambda_{x(s)}-2\,ds\right).$$ For a nice reference, check out Bremaud's 2020 book (I think it's chapter 15); this answer is too long for me to fully derive here. If you don't want to allow for negative population sizes, you can set $\mu_0 = 0$ (and say $L(x) = 0$ if $x(0) < 0$). Then, whenever $x$ predicts a negative population size, the likelihood function equals $0$.

Diffusion case: In this case, I strongly recommend discretizing the process as you did. But let's see what happens.

First, suppose that $\mu$ is known but $\sigma$ is unknown. Let's say you can find the value of $X_t$ for as many values of $t$ as you like. Then by the law of iterated logarithms,

$$\limsup_{t \to 0} \frac{X_t}{2t\ln(\ln(1/t))} = \sigma \text{ almost surely.}$$

This means that if you know the entire trajectory of $X$, you also know the value of $\sigma$ so (assuming $\mu$ is known to keep things simple), $$L(s) = \begin{cases} 1 &\text{ if } s = \sigma,\\ 0 &\text{ if } s \neq \sigma. \end{cases}$$ In practice, you could only estimate $\sigma$ since you would only have time to observe finitely many data points. The easiest way to do that would probably be to do what you did.

If $\sigma$ is known but $\mu$ is unknown, you can use Girsanov's theorem to find the likelihood by setting $\nu$ to be the distribution of $\sigma B_t$, where $B$ is a Brownian motion. Wikipedia has a good article on Girsanov's theorem: $$L(m;x) = \exp\left(\frac{1}{\sigma^2}\left(m X(T) - \frac{m^2 T}{2}\right)\right).$$

Combining the above, if both $\mu$ and $\sigma$ are unknown, then $$L(m,s;x) = \begin{cases} \exp\left(\frac{1}{s^2}\left(m X(T) - \frac{m^2 T}{2}\right)\right) &\text{ if } s = \sigma,\\ 0 &\text{ otherwise.} \end{cases}$$

  • WOW! thank you so much! This answer will truly be helpful for all students studying this in the future! – konofoso Aug 25 '24 at 02:48