According to the Adam optimization update rule: $$m \leftarrow \beta_1 m + (1 - \beta_1)\nabla J(\theta)$$ $$v \leftarrow \beta_2 v + (1 - \beta_2)(\nabla J(\theta) \odot \nabla J(\theta))$$ $$\theta \leftarrow \theta - \alpha \frac{m}{\sqrt{v}}$$
From the equations, it is clear that $m$ is accumulated gradient for each $\theta$ based on an exponential decay function and $v$ is doing a similar thing (kind of) but with the magnitude of the gradient. Then, when we update the parameters $\theta$, we divide the accumulated gradient with square root of the accumulated magnitude to only update those parameters largely which haven't been updated much and vice-versa.
In gradient clipping, we do a kind of similar thing by scaling the gradient vector with respect to a threshold. My question is, why do we need gradient clipping to solve the problem of gradient explosion when we can use Adam optimizer to do a controlled search of the space for the minima.