I think that this performance index is used when, in addition to minimizing the states and the control signal (represented in the terms $x^TQx$ and $u^TRu$), one is also interested in minimizing some output signal
$$
y = Cx + Du.$$
for safety or hard constraints for example (or tracking an output--- this depends on the nature of the used model).
In this case, expanding the weighted norm $y^TNy$, for some positive definite matrix $N$, usually results in cross terms between $x$ and $u$ (which is represented by $x^TMu$).
You can look at this situation as follows: we are not interested only in minimizing a norm of the state and a norm of the inputs, individually, for the usual known reasons, but also we are interested in minimizing linear functions of both $x$ and $u$. In this case you may write the performance index in terms of a single term
$$
\begin{bmatrix}x^T& u^T \end{bmatrix}\begin{bmatrix}Q & \frac{1}{2}M\\ \frac{1}{2}M^T & R \end{bmatrix} \begin{bmatrix}x\\ u \end{bmatrix}.$$
In otherwords, we work with a concatinated vector $ z:=\begin{bmatrix}x\\ u \end{bmatrix}$, and we are interested in minimizing a weighted norm$\|z\|_S$.
In the stochastic case, this will correspond to minimizing the error variance in the output signal. Note that the output signal does not have to be given as a state (because states in many models are not even physical and they cannot be measured)