My question is: when is the aprroximation of Hessian matrix $H=J^TJ$ reasonable?
One truth is that it is reasonable to approximate Hessian with first order derivatives (jacobian), i.e., $H=J^TJ$ when we are solving a non-linear least square problem (which is called Gauss-Newton method). In other words, that is the case when the cost function (energy function) is in quadratic form. This can be derived from Newton's method. See wiki: https://en.wikipedia.org/wiki/Gauss–Newton_algorithm
But is there any other cases when the aprroximation of Hessian matrix $H=J^TJ$ reasonable? Such as, for some of the general non-linear optimization problems?
I have noticed that some papers (in the field of Computer Vision) used Gauss-Newton or Levenberg–Marquardt (L-M) algorithm to solve non-linear non-least-square (i.e. general non-linear optimization) problems. (Which, in fact, is using the approximation of $H=J^TJ$) But none of them have actually explained why it is reasonable.
I have used this strategy in my own research too, and the experiments proved it to be efficient. But still, I don't know how to justify the hessian approximation mathemetically. (And I was asked by a reviewer in my recent journal paper submission.)
So again, is there any hints about how to justify the aprroximation of Hessian matrix $H=J^TJ$ for some of the general non-linear optimization problems?
Thank you very much for your kind help!

