I am trying to give some partial answers to your second question:
Are there other ways to distill these states?
Yes, and you can distill into much more magic states than the two states you give here. As you may have already noticed, the $[[5, 1, 3]]$ MSD protocol is not based on transversal gates. The $|T\rangle$ states in your description cannot even be understood as a non-Clifford gate acting on a stabilizer state. Luckily, the Stabilizer Reduction (SR) formalism (arXiv:0908.0838) allows you to characterize a $n$-to-$k$ MSD protocol using a $[[n,k]]$ stabilizer code. The SR protocol of code $\mathcal{Q}$ performs the following:
- Take an input state (Mostly, we assume it to be a tensor product state)
- Project the input state onto the codespace. This can be performed by measuring every stabilizer of $\mathcal{Q}$.
- Decode the post-measurement state, now in the logical space, back to the physical qubit space.
Mark Howard found out that there are many small stabilizer codes that can distill into many exotic magic states, including the $|T\rangle$ states (here). The smallest MSD protocol turns to be a $[[3, 1, 1]]$ code, and another four-qubit stabilizer code can distill into $|T\rangle$. However, the caveat for these known exotic protocols is they only suppress the input error linearly, i.e. $\epsilon_o \approx b \epsilon_i$ with constant $b<1$. In contrast, for the $[[15, 1, 3]]$ protocol, when the input error rate is $\epsilon_i\ll1$, the output error rate conditioned on successful post-selection gives $\epsilon_o \approx 35\epsilon_i^3$. One can show that protocols with linear efficiency has exponentially higher overhead than these with at least order-2 efficiency (see arXiv:2412.04402).
Another fact that might be good to know: It seems like many, many non-Clifford states can be distilled with some SR protocols. Also in arXiv:2412.04402, It has been shown that by concatenating MSD protocols with different target states, you can distill into magic states $|\theta\rangle \propto |0\rangle + e^{i\theta}|1\rangle$ with many different $\theta$.
A conjecture on all possible protocols for distilling $|T\rangle$
Notably, we see the $|T\rangle$ state is an eigenstate of Clifford gate $K=SH$, and the $[[5, 1, 3]]$ code admits transversal $K$. I therefore feel like that
$$
\text{codes admits transversal $K$} \approx \text{codes that can distill into $|T\rangle$}.
$$
The reasoning is following: If a MSD protocol can distill into $|T\rangle$, then when we input ideal tensor product state $|T\rangle^{\otimes n}$, it should give out ideal $|T\rangle^{\otimes k}$ as well. Assume $k=1$ for simplicity. Therefore, say $\bar{P}$ is the codespace projector, then we have
$$
|\bar T\rangle \langle \bar T| \propto \bar{P} (|T\rangle\langle T|)^{\otimes n} \bar{P}.
$$
If $K$ is transversal, then $\bar{K} = K^{\otimes n}$. so
$$
K^{\otimes n} \bar{P} (|T\rangle\langle T|)^{\otimes n} \bar{P} K^{\otimes n} = \bar{P} K^{\otimes n} (|T\rangle\langle T|)^{\otimes n} K^{\otimes n} \bar{P} = e^{in\pi/3} \bar{P} (|T\rangle\langle T|)^{\otimes n} \bar{P}
$$
So $\bar{P} (|T\rangle)^{\otimes n}$(a logical state) is also an eigenvector of $K^{n}$ (a logical operation). So $\bar{P} (|T\rangle)^{\otimes n}$ must be either $|\bar T\rangle$ or the other eigenstate equivalent to $|\bar T\rangle$ up to Clifford. Therefore, stabilizer codes that can distill into $|T\rangle$ should fall into the groups of codes that admit transversal $K$.
As $[[5, 1, 3]]$ and Steane code can also distill into $|H\rangle$, I think this conjecture is also applicable to Hadamard gates and the $|H\rangle$ state.