The way i see it, talking about slowdowns on simulations of a specific Turing machine $M_0$ doesn't make much sense. I could always just run $M_0$ and call this a simulation, which will result in no slowdown. I could also hardwire the code of $M_0$ , and in case the input was $M_0$, use some better algorithm (as D.W. did in his answer).
The more interesting question here is (in my opinion at least), what is the optimal slowdown achievable when simulating an arbitrary Turing machine $M$ on some input $x$ ? (asymptotically, in terms of $|x|$ and the length of the description of $M$)
We look at all possible inputs, and examine the worst case slowdown (perhaps for some machine $M_0$ you can do a better job, but here we consider the worst case running time).
More formally, Let $\mathcal{U}(\langle M\rangle,x)$ denote the universal Turing machine, which takes as input an encoding of a machine $M$ and some string $x$, and outputs $M(x)$, or does not halt in the case that the computation of $M$ on $x$ does not halt. We know that we can implement $\mathcal{U}$ in such a way that if the computation $M(x)$ requires time $T$, then $\mathcal{U}(\langle M\rangle,x)$ requires time $O\left(T\log T\right)$. Here the $O$ notation hides constants which depend on the number of states and the alphabet size of $M$ (but independent of $|x|$). Your question then translates to whether we can implement $\mathcal{U}$ such that the computation of $\mathcal{U}(\langle M\rangle,x)$ requires only $O(T)$ time?
It seems that for single tape machines, it is not known whether this $\log T$ factor is necessary, however for $k\ge 2$ tapes machines we can avoid it (proved by Furer, 1982). See this post by Kaveh for a detailed discussion and related quotes.