A few things to realize. The Newton method will be wrong at times too. You will have points in time when the closest distance is not at any root. That calculating out the roots and checking those will give you the wrong answer. And there's exactly nothing you could do about it.
The naive discretize method is going to be $n^2$ time with how close you want the values, and is going to bog down or be more wrong. But you can improve this a lot. I would strongly suggest variable resolution for that method, where you find the nearest point in the two graphs over k-discrete values. Then if you find that say values at 43 and 22 are the closest, you do the same thing you just did but within the range of 42-44 and 21-23 within the two graphs rather than for the whole graph. So if you want precision at the at 10,000th of the graph, you do not perform 100,000,000 distance checks, you perform 50,000 distance checks, by subdividing the graph into 100 parts, finding the then subdividing the ranges around those closest points into 200 points different distances. You can obviously also speed this up by knowing the order of the graphs and knowing that it can't spontaneously move closer more times than it has roots via the Bezout theorem. Or any of the other ways to find the nearest point to a bezier curve (which is a much easier problem).
As far as the computer science goes, I think clearly the best method is to do bezier subdivision of both curves, finding the bounding boxes that are nearest to each other, while you intelligently stop checking those bounding boxes that can be overtly ruled out. Namely if the closest part of the bounding box is further than the most far elements of the closest known bounding box pair, then that entire part of the graph cannot possibly contain the shortest distance, if we plead total ignorance then the graph could be anywhere in a bounding box. Once we divide the graph a few times, there would be ample resolution that we could easily show examples where even the best case scenario between a pairing is worse than the worst case scenario of another pairing. Thus, we know that pairing cannot possibly have the closest point. Further, if we can show this is true between some bounding box in one graph and all the bounding boxes of the other graph that entire section of the graph is proven invalid.
This method would actually converge fairly quickly since we'd be able to quickly get rid of entire sections of the graph, or in the case of many graphs, entire graphs even without needing to check them beyond a bounding box check. In the case where the two graphs have an intersection, the overlapping bounding boxes would quickly be subject to repeated divisions and finding a worst case scenario of a distance of zero or epsilon. And thus disregard all the rest of the graph unless there exists another intersection.
This also would easily get combined with a bunch of very nice and ultimately clever tricks that can be done to determine distances and overlap within the Axis Aligned Bounding Boxes (AABB) of computational geometry. While it's certainly the case that the graph of a bezier curve categorically is located within the convex hull polygon define by the points, and the bounding box contains more area, it's easier and likely computationally better to just get the bounding box for the curve. Especially when we consider a number of great tricks that can let us locate the nearest rectangle in less than $n$ time. If this sort of thing is mission critical you could certainly realize enough tricks to find all correct answers closer than some arbitrary level of error within imperceptibly short periods of time.