I am trying to understand how a Stochastic Process can be created based on real world data. Specifically, do all Stochastic Processes have underlying Probability Distributions? If this is the case, I think it might be possible to create a valid mathematical likelihood function based on this underlying Probability Distribution Function - and then estimate the required parameters using some estimation technique (e.g. Maximum Likelihood Estimation).
Now, I will add some more context to my question.
In a previous question (Simulating a Function (that is naturally contained) Within an Interval $a,b$), I tried to define a Stochastic Process that "naturally" only exists between points $(a,b)$. This process was based on a modified version of the Ornstein-Uhlenbeck process (whereas the Ornstein-Uhlenbeck process itself is based on the Weiner Process).
Part 1: Definitions
Wiener Process: As I understand, the Wiener Process $W_t$ can be thought of as successive differences between a Brownian Motion. Here are some standard properties about to Brownian Motion:
- $W_0 = 0$ almost surely.
- $W$ has independent increments: for every $t > 0$, the future increments $W_{t+u} - W_t$, $u \geq 0$, are independent of the past values $W_s$, $s < t$.
- $W$ has Gaussian increments: $W_{t+u} - W_t$ is normally distributed with mean 0 and variance $u$, $W_{t+u} - W_t \sim N(0, u)$.
Ornstein-Uhlenbeck Process: The Ornstein-Uhlenbeck process $X_t$ is defined by the following stochastic differential equation:
$$dx_t = \theta (\mu - x_t) \, dt + \sigma \, dW_t$$
where:
- $\theta > 0$ and $\sigma > 0$ are parameters
- $W_t$ denotes the Wiener process.
- $\mu$ is a drift constant.
- Modifying the Ornstein-Uhlenbeck process to be constrained between two points $(a,b)$:
In this answer (https://math.stackexchange.com/a/4828166/791334), I learned that the Ornstein-Uhlenbeck process can transformed into a new process $Y_t$ that will now be contained between $(0,1)$:
$$Y_t = \frac{e^{X_t}}{1 + e^{X_t}}$$
By scaling $Y_t$, I think we should now be able to define it between two points $(a,b)$:
$$Y_t = a + \left(\frac{e^{X_t}}{1 + e^{X_t}}\right) \cdot (b - a)$$
My Question: In this post here (https://stats.stackexchange.com/questions/605530/estimate-parameters-in-brownian-motion-with-drift-dx-t-mu-dt-sigma-dw-t), an approach is outlined as to how we estimate the parameters of a Brownian Motion using a Likelihood based approach (i.e. consecutive increments in the Brownian Motion are iid Normally Distributed, i.e. Wiener Process - thus, estimating the parameters of a Brownian Motion should correspond to estimating the parameters of a Normal Distribution via Maximum Likelihood Estimation):
It is well known that (note that $\{W_t\}$ by definition is a Gaussian process) for $0 < t_1 < \cdots < t_k$, the joint density of $(W_{t_1}, > \ldots, W_{t_k})$ is (where $t_0 = w_0 = 0$) \begin{align} > f_{t_1\cdots t_k}(w_1, \ldots, w_k) = \prod_{i = > 1}^k\frac{1}{\sqrt{2\pi(t_i - t_{i - 1})}} \exp\left[-\frac{(w_i - > w_{i - 1})^2}{2(t_i - t_{i - 1})}\right]. \end{align} Since the transformation $\mathbf{X} = \mu\mathbf{t} + \sigma\mathbf{W}$ is affine (where $\mathbf{W} = (W_{t_1}, \ldots, W_{t_k})$, $\mathbf{X} > = (X_{t_1}, \ldots, X_{t_k})$, $\mathbf{t} = (t_1, \ldots, t_k)$), the joint density of $(X_{t_1}, \ldots, X_{t_k})$ is then given by (where $t_0 = x_0 = 0$): \begin{align} & g_{t_1\cdots t_k}(x_1, > \ldots, x_k) \\ > =& \frac{1}{\sigma^k} \prod_{i = 1}^k\frac{1}{\sqrt{2\pi(t_i - t_{i - 1})}} \exp\left[-\frac{((\sigma^{-1}(x_i - \mu t_i) - \sigma^{-1}(x_{i > - 1} - \mu t_{i - 1}))^2}{2(t_i - t_{i - 1})}\right] \\ > =& \frac{1}{\sigma^k} \prod_{i = 1}^k\frac{1}{\sqrt{2\pi(t_i - t_{i - 1})}} \exp\left[-\frac{(x_i - x_{i - 1} - \mu(t_i - t_{i - > 1}))^2}{2\sigma^2(t_i - t_{i - 1})}\right]. \end{align}
This means that given data $x_1, \ldots, x_k$ observed at $0 < t_1 < > \cdots < t_k$, the log-likelihood function of $(\mu, \sigma)$ is \begin{align} > -k\log\sigma - \frac{1}{2}\sum_{i = 1}^k\log(2\pi(t_i - t_{i - 1})) - \frac{1}{2\sigma^2}\sum_{i = 1}^k((x_i - x_{i - 1} - \mu(t_i - t_{i - > 1}))^2. \tag{1} \end{align}
From $(1)$ it is easy to determine the MLE of $\mu$ and $\sigma$.
In my modified process $Y_t$, I have an additional parameter $\theta$ (note that $a$ and $b$ are pre-defined and do not need to be estimated).
- Differences between consecutive increments of the Brownian Motion are i.i.d Normal - thus making it possible to create a valid likelihood function. However, I am not sure if differences between consecutive increments of my process $Y_t$ are i.i.d. Normal, thus allowing for the existence and construction of a valid likelihood function?
- Is it somehow possible to still create a valid mathematical likelihood function corresponding to $Y_t$, such that all parameters can be estimated via Maximum Likelihood Estimation?
Thanks!