22

It is known that the time derivative of Wiener process $W(t)$ is defined as white noise $\xi(t)$ \begin{align} \xi(t) = \frac{dW}{dt} \end{align} By considering $dW/dt$ as finite difference form \begin{align} \frac{dW}{dt} \approx \frac{1}{h} \Big[ W(t+h) - W(t) \Big] \end{align} and take $h \to 0$, the book "An Introduction to Stochastic Differenial Equation" by Lawrence C. Evans (Chapter $3$) shows that the statistic of $\xi(t)$ is given by \begin{align} E[ \, \xi(t) \, ] =0, \quad \mathrm{and} \quad E[ \, \xi(t) \xi(s) \, ] = \delta(t-s) \end{align} which is what we expected as a white noise.

My Question: Motivated by the derivation in that book, I was wondering why can't we take the second time derivative on Wiener process (or take time derivative on white noise). Here is my attempt to take time derivative of white noise and derive the corresponding statistic.

Define $\eta(t) = [ \,\xi(t+h) - \xi(t) \, ] \, / \, h $ and take $h \to 0$ at the last step.

Expected value of $\eta(t)$: \begin{align} E[ \, \eta(t) \,] = \frac{1}{h} \Big\{ E[ \, \xi(t+h) \,] - E[ \, \xi(t) \,] \Big\} = 0 \end{align}

Covariance of $\eta(t)$: \begin{align} E[ \, \eta(t) \eta(s) \,] &= \frac{1}{h^{2}} E \Bigg[ \Big( \xi(t+h) - \xi(t) \Big) \; \Big( \xi(s+h) - \xi(s) \Big) \Bigg] \\ &= \frac{1}{h^{2}} \; \Bigg[ E[\xi(t+h) \xi(s+h)] - E[\xi(t+h) \xi(s)] - E[\xi(t) \xi(s+h)] + E[\xi(t) \xi(s)] \Bigg] \\ &= - \frac{1}{h^{2}} \Big[ \delta(t-s+h) - 2\delta(t-s) + \delta(t-s-h) \Big] \\ &= - \frac{d^{2}}{dz^{2}} \delta(z) \Bigg|_{z = t-s} \qquad \mathrm{as} \quad h \to 0 \\ &= - \frac{2\delta(t-s)}{(t-s)^{2}} \end{align}

It seems that we can define $\eta(t)$ (the time derivative of white noise) with the above statistic. Is there any fault in the above derivation ?

Follow up

In stochastic differential equation (SDE), we usually denote $dX_{t}$ instead of $dX/dt$ where $X$ is a random variable. The SDE actually represents an integral equation. However, in the field of physics or nonlinear dynamics, we often seen the notation like this $dX/dt$ (time derivative of a random variable). In some papers, we can even see the time derivative of white noise and that bothers me a lot since I've always heard that time derivative of white noise is undefined. So, what's the fundamental reason of not defining a time derivative of a random variable ?

K_inverse
  • 526

1 Answers1

22

There isn't really an issue with taking derivatives of stochastic processes like $W$, so long as you interpret the resulting process appropriately. Even the usual white noise process "$\xi = \frac{dW}{dt}$" should really be interpreted as a generalized stochastic process, that is the realizations of $\xi$ are generalized functions. This is because - as you state - realizations of $W$ are almost surely nowhere differentiable. However, they do have derivatives "in the sense of distributions" that is, generalized derivatives, and this is one way to attack the problem (the Ito calculus/stochastic differential form $dW_t$ approach is another way). If you have never seen the theory of generalized functions (called distributions elsewhere but in probability that word has another meaning), the following will probably not make too much sense to you, but this is how I work with these things. Gel'fand and Vilenkin ("Generalized Functions Volume IV") is the classic reference for this approach but there are probably better modern refs.

To define a generalized stochastic process $\eta$, you fix a space of test functions - usually smooth, compactly supported functions $\mathcal{D} = C_0^\infty$. Then, a generalized stochastic process $\eta(\omega)$ is a random element of $\mathcal{D}^\prime$ (a map $\eta:\Omega\rightarrow\mathcal{D}^\prime$ where $(\Omega,\mathcal{F},\mathbb{P})$ is a probability space). A much more convenient way to say this is that given any test function $\varphi\in\mathcal{D}$, we have that

$$ X_\varphi = \langle \eta,\varphi\rangle $$ is an ordinary real random variable. The bracket notation is intended to "look like" an inner product, i.e. you can think of $\langle \eta,\varphi\rangle = \int \eta(x)\varphi(x)dx$, though this isn't really correct because $\eta$ is "not a function".

The mean and covariance are then defined as

$$ \langle\mathbb{E}[\eta],\varphi\rangle = \mathbb{E}[\langle\eta,\varphi\rangle] = \mathbb{E}[X_\varphi] $$ and

$$ Cov(\varphi,\psi) = \mathbb{E}[X_\varphi X_\psi] $$ From this, you can extract the covariance operator via

$$ \mathbb{E}[X_\varphi X_\psi] = \langle \mathcal{C}\varphi,\psi\rangle $$ This formula is difficult to parse until you work some examples - we'll see in a second how this works.

Returning to your original question: suppose we want to define $\dot{W}$ using this approach. Well, in the theory of generalized functions, we have the definition

$$ X_\varphi = \langle \dot{W},\varphi\rangle = - \langle W,\dot{\varphi}\rangle $$The negative sign comes from "integration by parts". Now, because $W$ is (almost surely) continuous and $\dot{\varphi}$ is smooth, we can use integrals instead of "abstract brackets":

$$ X_\varphi(\omega) = -\int_{-\infty}^\infty W(t,\omega)\dot{\varphi}(t) dt $$ Thus (interchanging limits requires a moment of justification):

$$ \mathbb{E}[X_\varphi(\omega)] = -\int_{-\infty}^\infty \mathbb{E}[W(t,\omega)] \dot{\varphi}(t) dt = 0 $$ and

$$ \mathbb{E}[X_\varphi(\omega)X_\psi(\omega)] = \int_{-\infty}^\infty\int_{-\infty}^\infty \mathbb{E}[W(s,\omega)W(t,\omega)] \dot{\varphi}(s)\dot{\psi}(t) dsdt = \int_{-\infty}^\infty\int_{-\infty}^\infty \min(s,t) \dot{\varphi}(s)\dot{\psi}(t) dsdt $$ To see how this results in "$k(s,t) = \delta(s-t)$" covariance, you do a bit of calculus, remembering that $\varphi(s)$ and $\psi(t)$ are smooth and compactly supported so all the integration by parts boundary terms vanish, and you see that

$$ \int_{-\infty}^\infty\int_{-\infty}^\infty \min(s,t) \dot{\varphi}(s)\dot{\psi}(t) dsdt = \int_{-\infty}^\infty \varphi(t) \psi(t) dt $$ Thus we have written

$$ \mathbb{E}[X_\varphi X_\psi] = \langle \mathcal{C}\varphi,\psi\rangle $$where $\mathcal{C}$ is the "identity operator", that is the convolution operator with kernel $\delta(s-t)$.

If you want to do the same thing but with $\ddot{W}$, you would start with the definition of the generalized ("distributional") second derivative:

$$ \langle\ddot{W},\varphi\rangle = \langle W,\ddot{\varphi}\rangle $$ You can then work through the same process to see that

$$ \langle\mathbb{E}[\ddot{W}],\varphi\rangle = \langle\mathbb{E}[W],\ddot{\varphi} \rangle = 0 $$ and

$$ \mathbb{E}[X_\varphi X_\psi] = \int_{-\infty}^\infty\int_{-\infty}^\infty\min(s,t) \ddot{\varphi}(s)\ddot{\psi}(t) dsdt = -\int_{-\infty}^\infty \ddot{\varphi}(t)\psi(t) dt = \langle \mathcal{C}\varphi,\psi\rangle $$ Thus the covariance operator is the negative second derivative, i.e. the covariance kernel function is $-\ddot{\delta}(s-t)$.

Additional note In response to a good comment, how do we know that the processes $\dot{W}$ and $\ddot{W}$ are Gaussian? First, a generalized Gaussian random process $\eta$ is one for which any random vector formed by testing against $N$ functions is (multivariate) Gaussian, i.e. if

$$ X_{\varphi_1:\varphi_N} = [\langle \eta,\varphi_1\rangle,\ldots,\langle \eta,\varphi_N\rangle ]^t \in \Bbb{R}^N $$ then $\eta$ is Gaussian if and only if $X_{\varphi_1:\varphi_N}$ is Gaussian for every choice of $(\varphi_1,\ldots,\varphi_N)\in \mathcal{D}^N$. With this definition, it is easy to show that if $W$ is a classical Gaussian random process - say one with almost surely continuous paths such as the Wiener process - then $W$ is also a generalized Gaussian random process.

Then, Gaussianity of the (generalized) derivatives of $W$ follows from the definitions $$ \langle\dot{W},\varphi\rangle := - \langle W,\dot{\varphi}\rangle\\ \langle\ddot{W},\varphi\rangle := \langle W,\ddot{\varphi}\rangle $$ Since $W$ is a generalized Gaussian R.P., $\dot{W}$ and $\ddot{W}$ are as well.

icurays1
  • 17,647
  • 1
  • 52
  • 76
  • 3
    Good answer, however in this answer, and in the OP, only the first two moments are calculated - clearly we treat the white noise process as some sort of generalised Gaussian process, and in the more familiar case the fact that linear operations such as differentiating lead to another Gaussian process would suggest we expect $\ddot{W}$ to be Gaussian and so justify only calculating the first two moments, but I wouldn't mind just seeing how that is made more precise in the generalised stochastic process sense if you had a moment? – Nadiels Sep 11 '17 at 13:27
  • @Nadiels Good comment, I added a note. – icurays1 Sep 11 '17 at 14:58
  • All right, I am interested in finding a numerical solution for the second derivative of the Wiener process. How could I do that? Would you be able to respond to this question: Numerical solution of the second derivative of a Wiener process ? – IRO Nov 04 '17 at 09:59
  • Are there some measurability restrictions to $\eta$ here? – Andrei Kh Jan 15 '21 at 18:59
  • @AndreiKh certainly, by saying "$X_\varphi = \langle \eta,\varphi\rangle$ is an ordinary real R.V. for each test function $\varphi$", you're saying $X_\varphi$ is a measurable map from $\Omega$ to $\mathbb{R}$. – icurays1 Jan 15 '21 at 20:02