To model data based on the exponential equation
$$ y=Ae^{Bx} $$
On the website https://mathworld.wolfram.com/LeastSquaresFittingExponential.html
In equation (5)
$$ \sum_{i=1}^{n}{y_i\left(\ln{\left(y_i\right)}-a-{bx}_i\right)^2} \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space \space\space\space\space\space\space\space\space\space\space\space (5) $$
where
$$ \ln{\left(A\right)}=a $$
$$ B=b $$
It says this sum should be minimized. It works better than the linearizing the data and doing linear regression for the equation below.
$$ \ln(y)=\ln(A)+Bx $$
I know that this works better but what is the justification for the use of equation (5) or how was it derived?