Maybe a different point of view can help your understanding. The map $\varphi\colon S\times T\to ST$ defined by $\varphi(s,t)=st$ is surjective. Define an equivalence relation on $S\times T$ by declaring
$$
(s,t)\sim(s',t') \text{ if and only if }\varphi(s,t)=\varphi(s',t')
$$
(that is, $st=s't'$). Then $\varphi$ induces a bijection $S\times T/{\sim}\to ST$, so the number of equivalence classes is $|ST|$.
If we prove that all equivalence classes with respect to $\sim$ have the same number of elements $k$, we'll know that $|ST|=|S\times T|/k$, because $S\times T$ is partitioned in classes with the same number of elements $k$.
Now the equivalence class of $(s,t)$ is precisely $\varphi^{-1}(st)$. When we have proved that, for each $x\in ST$ we have $|\varphi^{-1}(x)|=|S\cap T|$ we have precisely determined we're in the hoped for situation and that $k=|S\cap T|$.
The idea is the same as the common proof of Lagrange's theorem: a subgroup $H$ of $G$ defines an equivalence relation $x\sim y$ if and only if $x^{-1}y\in H$. The equivalence classes have the same cardinality as $H$, so $|G|$ is $|H|$ times the number of equivalence classes. In the present case the number of equivalence classes is $|ST|$ and their common cardinality is $|S\cap T|$.
The proof can be done as follows: let $x\in ST$. Fix $(s_0,t_0)$ such that $x=s_0t_0$; for $(s,t)\in\varphi^{-1}(x)$, consider that $st=s_0t_0$, so $s_0^{-1}s=t_0t^{-1}\in S\cap T$; then define
$$
f\colon \varphi^{-1}(x)\to S\cap T
$$
by $f(s,t)=s_0^{-1}s=t_0t^{-1}$ and prove it is a bijection.