I am learning machine learning and encountered KL divergence: $$ \int p(x) \log\left(\frac{p(x)}{q(x)}\right) \, \text{d}x $$ I understand that this measure calculates the difference between two probability distributions. I have written down the formula as follows (where $p(x)$ is the true distribution and $q(x)$ is the approximated distribution): $$ \int p(x) \log\left(\frac{1}{q(x)}\right) \, \text{d}x - \int p(x) \log\left(\frac{1}{p(x)}\right) \, \text{d}x $$ I understand this formula as the difference between the entropy of the approximated distribution and the entropy of the true distribution.
The question is: why do we multiply the entropy of the approximated distribution by $p(x)$ instead of $q(x)$?