Adding to the answers already given (complexity related to the computation of fu...

Adding to the answers already given (complexity related to the computation of full Hessians). First order methods, i.e., methods that only require gradients, are the methods of choice when extreme solution accuracy is not required, which is the case in many practical settings (including ML).

One of Newton's method major selling points is that once it's close to a local minimum, under the right smoothness assumptions it essentially converges faster and faster to an exact optimizer - in practice you get "one more correct significant digit each iteration" once you are close enough. It's called Newton's method region of quadratic convergence, see [0] Theorem 14.1, p.3

[0] https://www.stat.cmu.edu/~ryantibs/convexopt-F16/scribes/new...