I decided to post an answer to my own question as an addition to an already good answer by levap. My method will be a direct derivation based on the fact that $I(T)[A] = -T^{-1}AT^{-1}$, using induction.
The base case $k=1$ is already covered by the above formula (whose proof can be found here). Now, assume that
$$
I^{(k)}(T)[A_1,\dots,A_k] = (-1)^{k} \sum_{\sigma\in S_k} T^{-1}A_{\sigma(1)}T^{-1}\dots T^{-1}A_{\sigma(k)} T^{-1},
$$
holds ($S_k$ is the symmetric group of order $k$). I will rewrite it as
$$
I^{(k)}(T)[A_1,\dots,A_k] = (-1)^{k} \sum_{\sigma\in S_k} (M_{k,\sigma}\circ I)(T)[A_1,\dots,A_k],
$$
where $M_{k,\sigma}(T)$ is the $k$-linear map
$
M_{k,\sigma}(T)[A_1,\dots,A_k] = T A_{\sigma(1)}T \dots T A_{\sigma(k)} T.
$
After a little bit of calculation, we can see that
$$\begin{align}
M_{k,\sigma}(T+S)&[A_1,\dots,A_k] - M_{k,\sigma}(T)[A_1,\dots,A_k] \\
= \ \ \ & (S A_{\sigma(1)}TA_{\sigma(2)}T \dots T A_{\sigma(k)} T) + (T A_{\sigma(1)}S A_{\sigma(2)}T \dots T A_{\sigma(k)} T) + \dots \\
&\ \ \ + (T A_{\sigma(1)}TA_{\sigma(2)} T \dots T A_{\sigma(k)} S) + o(||S||),
\end{align}$$
which implies that the derivative of $M_{k,\sigma}$ is given by
$$\begin{align}
M'_{k,\sigma}(T)[A_1,\dots,A_k,B] &= (B A_{\sigma(1)}TA_{\sigma(2)}T \dots T A_{\sigma(k)} T) + (T A_{\sigma(1)}B A_{\sigma(2)}T \dots T A_{\sigma(k)} T) + \dots \\
&\ \ \ + (T A_{\sigma(1)}TA_{\sigma(2)} T \dots T A_{\sigma(k)} B).
\end{align}$$
By the chain rule (for multilinear maps), we have
$$\begin{align}
(M_{k,\sigma}\circ I)'(T)[A_1,\dots,A_k,B] &= (M'_{k,\sigma}\circ I)(T)[A_1,\dots,A_k,I'(T)[B]] \\
&= (M'_{k,\sigma})(T^{-1})[A_1,\dots,A_k,-T^{-1}BT^{-1}] \\
&= (-T^{-1}BT^{-1}) A_{\sigma(1)}T^{-1} A_{\sigma(2)}T^{-1} \dots T^{-1} A_{\sigma(k)} T^{-1} + \dots \\
&\ \ \ \ + T^{-1} A_{\sigma(1)}T^{-1} A_{\sigma(2)} T^{-1} \dots T^{-1} A_{\sigma(k)} (-T^{-1}BT^{-1})
\end{align}$$
Lastly, we apply the above formula to the inductive step to get
$$\begin{align}
I^{(k+1)}(T)[A_1,\dots,A_k,A_{k+1}] &= (-1)^{k} \sum_{\sigma\in S_k} (M_{k,\sigma}\circ I)'(T)[A_1,\dots,A_k,A_{k+1}] \\
&= (-1)^{k} \sum_{\sigma\in S_k}
(-T^{-1}A_{k+1}T^{-1} A_{\sigma(1)}T^{-1} A_{\sigma(2)}T^{-1} \dots T^{-1} A_{\sigma(k)} T^{-1} - \dots \\
&\quad\quad\quad\quad\quad\quad - T^{-1} A_{\sigma(1)}T^{-1} A_{\sigma(2)} T^{-1} \dots T^{-1} A_{\sigma(k)}T^{-1}A_{k+1} T^{-1}) \\
&= (-1)^{k+1} \sum_{\rho\in S_{k+1}} (M_{k+1,\rho}\circ I)(T)[A_1,\dots,A_{k+1}]
\end{align}$$
where the last equality can be verified readily that such permutations really go through all $\rho\in S_{k+1}$. This concludes the proof.