Let $x, y \in \mathfrak{g}$ be any two elements of a Lie algebra $\mathfrak{g}$, then in $U(\mathfrak{g})$ we have the relation $xy = [x, y] + yx$, where $[w, z] \in \mathfrak{g}$ is the Lie bracket (rather than the commutator bracket). Considering $x^2 y$ and applying this relation three times:
$$ \begin{aligned}
x^2 y
&= x([x, y] + yx) \\
&= x[x, y] + (xy) x \\
&= ([x, [x, y]] + [x, y]x) + ([x, y] + yx)x \\
&= [x, [x, y]] + 2 [x, y]x + yx^2.
\end{aligned}$$
From here we can do induction: suppose that for $x, y \in \mathfrak{g}$ and for some $n \geq 1$ we have
$$ x^n y = \sum_{k = 0}^n \binom{n}{k} (\operatorname{ad}_x^k y) x^{n - k}.$$
Then we have
$$\begin{aligned}
x^{n + 1} y
&= x \sum_{k = 0}^n \binom{n}{k} (\operatorname{ad}_x^k y) x^{n - k} \\
&= \sum_{k = 0}^n \binom{n}{k} \left( [x, \operatorname{ad}_x^k y] + (\operatorname{ad}_x^k y)x \right) x^{n - k} \\
&= \sum_{k = 0}^n \binom{n}{k} (\operatorname{ad}_x^{k + 1} y)x^{n-k} + (\operatorname{ad}_x^k y) x^{n + 1 - k} \\
&= \sum_{j = 1}^{n + 1} \binom{n}{j - 1} (\operatorname{ad}_x^j y)x^{n + 1 - j} + \sum_{j = 0}^{n} \binom{n}{j} (\operatorname{ad}_x^j y)x^{n + 1 - j} \\
&= (\operatorname{ad}_x^{n+1} y) + \sum_{j = 1}^n \left( \binom{n}{j - 1} + \binom{n}{j}\right)(\operatorname{ad}_x^j y)x^{n + 1 - j} + x^{n + 1} \\
&= (\operatorname{ad}_x^{n+1} y) + \sum_{j = 1}^n \binom{n + 1}{j}(\operatorname{ad}_x^j y)x^{n + 1 - j} + x^{n + 1} \\
&=\sum_{k = 0}^{n+1} \binom{n+1}{k} (\operatorname{ad}_x^k y) x^{n + 1 - k}.
\end{aligned}$$
As you can see, the induction is long and tedious, and it would be nice to be able to directly use the binomial theorem.
Otherwise, we can argue more abstractly. On the tensor algebra $T(\mathfrak{g})$, given an element $x \in \mathfrak{g}$ we have three linear operators: $l_x(\tau) = x \otimes \tau$ (left multiplication by $x$), $r_x(\tau) = \tau \otimes x$ (right multiplication by $x$), and the derivation $\operatorname{ad}_x$, defined on $\mathfrak{g}$ by $\operatorname{ad}_x(y) = [x, y]$, and extended as a derivation, so that $\operatorname{ad}_x(\tau \otimes \omega) = \operatorname{ad}_x(\tau) \otimes \omega + \tau \otimes \operatorname{ad}_x(\omega)$.
These operators pairwise commute, which is easy to see for $l_x$ and $r_x$. For $\operatorname{ad}_x$ and $l_x$ we check
$$ \operatorname{ad}_x l_x \tau = \operatorname{ad}_x(x \otimes \tau) = \operatorname{ad}_x(x) \otimes \tau + x \otimes \operatorname{ad}_x(\tau) = 0 + x \otimes \operatorname{ad}_x(\tau) = l_x \operatorname{ad}_x \tau,$$
and the proof for $\operatorname{ad}_x$ and $r_x$ is similar. From the relation $xy = [x, y] + yx$, we find that in the quotient $U(\mathfrak{g})$ we have the equality of operators $l_x = \operatorname{ad}_x + r_x$. Now the proof is very easy:
$$ l_x^n y = (\operatorname{ad}_x + r_x)^n y = \sum_{k = 0}^n \binom{n}{k} r_x^{n - k} \operatorname{ad}_x^k y = \sum_{k = 0}^n \binom{n}{k} (\operatorname{ad}_x^k y) x^{n - k}.$$
$$x^{N+1}y=\sum_{k=0}^N\binom{n}{k}\left((n+1)(ad_x)^{k+1}(y)+(n+1)[(ad_x)^k(y),x]+x(ad_x)^k(y)\right)x^{N-k}$$
But it doesn't seem to get me much further than just in circles
– njlieta Jan 12 '21 at 21:17