Generate spanning tree by adding random edges

Question

The problem is that we have $M$ nodes and at each time, we add a edge between $i$ and $j$, both of which are uniformly randomly chosen. I wonder what is the probability that there exist a spanning tree after adding $N$ edges.

Thank you for all the comments! I think it is different from the question in Exact probability of random graph being connected. In this problem, we are adding edges one at a time and it may gives you the same edge multiple times. I guess there should be some results but I haven't found one.

So you are asking for the probability of the graph being connected? — Rushabh Mehta, Apr 17 '19 at 18:59
@DonThousand As an exact probability, the question you linked to is not a good answer to this question, because the random graph model is different: adding all edges with probability $p$ rather than choosing them one at a time. Asymptotically, the thresholds are the same, but that has nothing to do with the (rather unhelpful) exact formula. — Misha Lavrov, Apr 17 '19 at 19:18
@MishaLavrov Good point. However, I stand by my claim that this question is a duplicate. Maybe not of the question I linked. — Rushabh Mehta, Apr 17 '19 at 19:20
@DonThousand Find the duplicate, then! I have not found any; this is a well-known result, but it might not be one that already appears on MSE. — Misha Lavrov, Apr 17 '19 at 19:28

score 3 · Accepted Answer · answered Apr 18 '19 at 04:04

There are no good exact answers.

For values of $N$ close to $M$, we can write down something intelligent. For example, when $N=M-1$, the total number of connected graphs with $M-1$ edges and $M$ vertices is $M^{M-2}$ (the number of labeled trees), so the probability is $$ \frac{M^{M-2}}{M^{2M}} = \frac1{M^{M+2}}. $$ (Assuming that you allow loops - edges with $i=j$ - which you seem to. Also, it's very confusing that you are using $M$ to denote vertices and $N$ to denote edges rather than the other way around, but I'll stick to your notation to avoid causing even more confusion.)

Analogously to the exact formula for $G(n,p)$, we can write a recursive formula for the probability that is an exact but useless answer: $$ f(M,N) = \sum_{i=1}^M \binom{M-1}{i-1} \sum_{j=0}^N \binom{N}{j} \left(\frac{i^2}{M^2}\right)^j \left(\frac{(M-i)^2}{M^2}\right)^{N-j} f(i,j). $$ The idea here is to sum over all ways to choose $i-1$ vertices to be in the same connected component as the first vertex, and then to choose $j$ edges to be in that component. Then the probability we get is the probability that those $j$ edges are between vertices in the connected component, that the other $N-j$ edges are not incident to those vertices, and that the connected component actually is connected.

We can, however, get very good approximations as $M \to \infty$. (These should be pretty good for $M$ that are not all that large.) The key turns out to be the number of isolated vertices. Setting $N = \frac12 M(\log M + C)$ for a new parameter $C$, we get that the probability that a given vertex is isolated is $$ \left(1 - \frac{2M-2}{M^2}\right)^N \sim \exp\left(-\frac2M \cdot N\right) = \exp(-\log M - C) = \frac{e^{-C}}{M}. $$ So the expected number of isolated vertices is $e^{-C}$. Though these events are not independent, they are very close to independent, and so the distribution of the number of isolated vertices is asymptotically Poisson with mean $e^{-C}$; therefore the probability tends to $e^{-e^{-C}}$ that there are no isolated vertices.

Meanwhile, more complicated connected components have vanished by the time $N$ is about $\frac12 M \log M$. For example, for a connected component of order $2$ you have to pay about the same price for all the missing edges (the expected number would be about $e^{-2C}$ if that's all that mattered) but also the edge connecting the two vertices you pick has to be present (for a penalty roughly on the order of $O(\log M/M)$). So $e^{-e^{-C}}$ is also the limiting probability that the graph is disconnected.

This means that when $N$ is much smaller than $\frac12 M \log M$, the graph is almost always disconnected; when $N$ is much larger, the graph is almost always connected.

A more detailed proof can be found as Theorem 4.1 here.

Generate spanning tree by adding random edges

1 Answers1