There are different ways of looking at the nilpotence, but at its heart nilpotency is a statement about centralization.
For instance, one can show the lower central series descends to the identity, or more strongly that if $N$ is a (finite) non-identity $p$-group normalized by a $p$-group $G$, then $[G,N] < N$. However, this is just saying that $N/[G,N]$ is a nontrivial subgroup of the center $Z(G/[G,N])$.
Another standard centralization fact is that an element of order $p^k$ acting on a vector space in characteristic $p$ has a nonzero fixed point. Indeed, any finite group of order $p^k$ acting on a vector space in characteristic $p$ has a nonzero simultaneous fixed point.
In other words, $\{ g - 1 : g \in G \}$ has a kernel when acting on those vector spaces so that $\{ g-1 : g \in G \}^n= 0$ for some large enough $n$, that is, this is a nilpotent set of operators.
Proposition: If $N$ is a finite non-identity $p$-group and $G$ is a $p$-subgroup of $\operatorname{Aut}(N)$, then $[G,N] < N$.
Proof: Consider $G$ as a permutation group on the set $N$. It fixes the point $1_N$. If $G$ fixes the point $x \in N$, then it also fixes the points $x^i$. Indeed, $C_N(G)$, the set of fixed points of $G$ in $N$ is easily seen to be a subgroup of $N$. By Lagrange, $C_N(G)$ has order 1 or a multiple of $p$. The other orbits have sizes a multiple of $p$. Since the whole set $N$ has order a multiple of $p$, $C_N(G)$ must be non-identity. It is clearly $G$-invariant, so we can consider $\bar N = N/C_N(G)$ of strictly smaller order. Hence $[G,\bar N] < \bar N$, but $[G,\bar N] = [G,N] C_N(G) \geq [G,N]$, so $[G,N] < N C_N(G) = N$, as was to be shown.
Hopefully it is clear this is fundamentally the same proof, but it describes a dual aspect of nilpotency: $[g,-]:N \to N:n \mapsto [g,n]$ is nilpotent, that is, $[g,[g,\dots,[g,[g,n]]\dots]]=1$
As you are studying these ideas it is a good idea to check out “coprime action”, which is the opposite. What I've described is nilpotent action when $[G,N] C_N(G) < N$. In coprime action, you have $N = [G,N] C_N(G)$, and if $N$ is abelian you even get a direct product.
I don't believe there can be any proof using only power-commutator ideas. The issue to my mind is two-fold: dihedral groups are generated by elements of order 2, but have unbounded nilpotency class, so no simple use of powers of generators can help. Even worse, Tarski monsters have prime exponent, so their power law is trivial. Any generally valid power-commutator formula would have to be valid both in the nilpotent extra-special groups of exponent $p$ and in the non-abelian simple groups of exponent $p$, which are clearly not nilpotent. McLain's (vector space) example is a locally nilpotent, metabelian, exponent $p$ group, so it satisfies every law of nilpotent $p$-groups (which is a fairly trivial statement, hehe) including every power-commutator law. However, it has trivial center and is its own derived subgroup.
Proposition: There is no proof that finite $p$-groups are nilpotent that only uses formulas, laws, etc. valid in all finite $p$-groups. There are no laws of finite $p$-groups that do not hold more generally in all groups.
Proof: It is fairly well known that the free group is “residually-(finite-$p$-group)”, that is, that intersection of the lower exponent-$p$ central series is the identity. This means that the free group is a subgroup of a direct product of finite $p$-groups (in fact, it is a subdirect product, so it surjects onto each factor when that factor is viewed as a quotient of the direct product). Since any particular formula valid in all finite $p$-groups is valid in their direct product, that formula remains valid also on the subgroup. It also remains valid in quotient groups. In plain language: the variety generated by the finite $p$-groups is the class of all groups. There are no laws of finite $p$-groups that do not hold more generally in all groups. $\square$
Thus the examples of Tarski and McLain should not be too surprising. A group can be suspiciously similar to a finite $p$-group when you look at any finite collection of elements, and yet be completely different when considered as a whole.
This is not to say power-commutator relations are not important. They are fundamental when understanding $p$-groups. However, their utility comes once one knows that the $p$-group is nilpotent (or more importantly to my mind: that it acts nilpotently on finite $p$-groups).