This is a question I have always had, relating to probability and statistics.
In many applications (e.g. estimating the parameters of a probability distribution function), we almost always end up trying to optimize the "log likelihood" instead of just the "likelihood".
From a computational standpoint, I have heard that this is much easier - for example, diffrentiating the log likelihood can remove exponent terms and thus make the optimization process simpler.
From a mathematical standpoint, we are often told (without explanation) that optimizating the log likelihood function is equivalent to maximizing the original likelihood function - for example, the stationary points (i.e. where the derivatives are 0) on the log likelihood function are apparently equivalent to the stationary points on the original likelihood function. Therefore, optimizing the log likelihood function or the original likelihood function will result in identical parameter estimates.
My question relates to the mathematics of this phenomenon : Does the logarithm of a function always preserve the stationary points of the original function - and if so, why does this happen?
As a reference, I found the following quote (Why we consider log likelihood instead of Likelihood in Gaussian Distribution):
"Because the logarithm is monotonically increasing function of its argument, maximization of the log of a function is equivalent to maximization of the function itself."
Thus - how do I know that the above is true?
I will assume that "a logarithm is montonically increasing function of its argument" is true by definition - can we mathematically prove that :
Maximizing the log of a function is ALWAYS equivalent to maximizing the function itself?
Given any montonically increasing function - does maximizing a function and any montonically increasing transformation of this original function ALWAYS results in identical results?
Thanks!