Longest path length in an undirected tree, can we prove this algorithm is correct (which it is)?

Question

Hello I solved this leetcode https://leetcode.com/problems/tree-diameter/ question reserved for people who pay the subscription.

The question:

Given an undirected tree (tree is not disjoint), return its diameter: the number of edges in a longest path in that tree.

The tree is given as an array of edges where edges[i] = [u, v] is a bidirectional edge between nodes u and v. Each node has labels in the set {0, 1, ..., edges.length}.

So the "easy" way to solve it is to start a dfs at each node in the tree and keep track of the longest path seen so far, which would be O(n^2). Given n = number of vertexes in the graph.

However I came up with the below solution that does the job in O(n), I was able to come up with it by going through examples on a piece of paper and then coded it and it passed all the tests, but I can't figure out why it formally works, can anyone help me prove it is correct ?

Here is my code:

   def treeDiameter(self, edges: List[List[int]]) -> int:
    def dfs(graph, vertex, prev, maximums):
        for neighboor in graph[vertex]:
            if neighboor != prev:
                tmp = 1 + dfs(graph, neighboor, vertex, maximums) 
                if tmp > maximums[vertex][0]:
                    maximums[vertex][1] = maximums[vertex][0]
                    maximums[vertex][0] = tmp
                elif tmp > maximums[vertex][1]:
                    maximums[vertex][1] = tmp
        return maximums[vertex][0]

    graph = collections.defaultdict(list)
    for vertex1, vertex2 in edges:
        graph[vertex1].append(vertex2)
        graph[vertex2].append(vertex1)

    maximums = [[0, 0] for _ in range(len(edges) + 1)]
    dfs(graph, 0, None, maximums)

    res = 0
    for max1, max2 in maximums:
        res = max(res, max1 + max2)
    return res

The way I came up with this is I realized if I start my dfs at a node that is on the longest path, then the longest path is equal to the sum of the two longest pathes (because it is a tree there are no cycles) out of this node (which is why I store the two longest pathes max1, max2 out of a node in the maximums array) --> but what I fail to see is why that works in all cases.

nitangle · Answer 1 · 2020-02-27T14:03:46.550

I think this might help. Copied from https://stackoverflow.com/questions/20010472/proof-of-correctness-algorithm-for-diameter-of-a-tree-in-graph-theory.

Let s, t be a maximally distant pair. Let u be the arbitrary vertex. We have a schematic like,

    u
    |
    |
    |
    x
   / \
  /   \
 /     \
s       t

where x is the junction of s, t, u (i.e. the unique vertex that lies on each of the three paths between these vertices).

Suppose that v is a vertex maximally distant from u. If the schematic now looks like

    u
    |
    |
    |
    x   v
   / \ /
  /   *
 /     \
s       t ,

then

d(s, t) = d(s, x) + d(x, t) <= d(s, x) + d(x, v) = d(s, v), where the inequality holds because d(u, t) = d(u, x) + d(x, t) and d(u, v) = d(u, x) + d(x, v). There is a symmetric case where v attaches between s and x instead of between x and t.

The other case looks like

    u
    |
    *---v
    |
    x
   / \
  /   \
 /     \
s       t .

Now,

d(u, s) <= d(u, v) <= d(u, x) + d(x, v) d(u, t) <= d(u, v) <= d(u, x) + d(x, v)

d(s, t) = d(s, x) + d(x, t) = d(u, s) + d(u, t) - 2 d(u, x) <= 2 d(x, v)

2 d(s, t) <= d(s, t) + 2 d(x, v) = d(s, x) + d(x, v) + d(v, x) + d(x, t) = d(v, s) + d(v, t), so max(d(v, s), d(v, t)) >= d(s, t) by an averaging argument, and v belongs to a maximally distant pair.

PS: I saw many answers after searching the question. I do not have enough repuation to comment otherwise I would have done so.

score 0 · Answer 2 · answered Nov 23 '20 at 16:51

I think this is actually a much more natural algorithm than the "standard" one involving two BFSes, linked by nitangle :)

One way to interpret it: The $i$-th entry in maximums stores the lengths of the two longest paths emanating from vertex $i$ that (a) have distinct second vertices and (b) avoid a specific vertex (namely, $i$'s parent in the DFS). The sum of these two lengths is the length of the longest path that has vertex $i$ as its "highest point", if we imagine picking up the tree by the DFS root (vertex 0), and recursing "down" to a vertex's children.

Since every path in a tree has a "highest point" -- including the overall longest path -- taking the maximum over all $n$ of these maxima will find the overall longest path.

Another note: It's not necessary to store the complete set of per-vertex maximum pairs in an array and then look for the overall maximum among them at the end -- it would suffice to compute, within each call to dfs(), that vertex's maximum pair, and then once all that vertex's children are processed, update a global maximum if this pair beats the incumbent solution. This reduces the additional space usage, though not asymptotically (since if the tree is a path and the root is one of its endpoints, $O(n)$ space will still be needed for the dfs() stack frames).

Longest path length in an undirected tree, can we prove this algorithm is correct (which it is)?

2 Answers2