2

Suppose we construct an Erdős–Rényi graph $G(n, p)$.

Fix two nodes $u$ and $v$. What is the probability that there is no path connecting the two nodes?


My take: I tried to model the problem as $P(\text{no path between } i \text{ and } j) = 1 - P(\text{at least a path between } i \text{ and } j)$.

  • If $n = 2$: $P(\text{no path between } i \text{ and } j) = 1-p$
  • If $n = 3$: $P(\text{no path between } i \text{ and } j) = 1 - (p + p^2 - p^3)$
  • If $n$ is an arbitrary number: I will try to compute the probablity of existence of a path.

Update: the following is incorrect.

Let $Z_{u \rightarrow v}$ be a boolean variable, denoting the path from $u$ to $v$. $Z_{u \rightarrow v} = 1$ if and only if there is a path from $u$ to $v$. The goal is to find the probabilty of having a path from $u$ to $v$, in other words $\mathbb{P}(Z_{u \rightarrow v})$. Instead of calculating this probability, we use the expected number of path from node $u$ to any other node: $$ \mathbb{P}(Z_{u \rightarrow v}) = \frac{ \mathbb{E} \left( Z_{u \rightarrow ? \neq u} \right) }{ n - 1 } $$

Next we calculate $\mathbb{E} \left( Z_{u \rightarrow ?} \right)$. Suppose we start from $u$ and we visit its neighbors. Next we repeat the same procedure for its neighbors (like, say via BFS). Suppose we have seen all neighbors of distance $k$ from $u$. Define $S_k$ to be the size of nodes accessible from $u$ with at most $k$ steps. Suppose there is an ordering on visiting the elements in this procedure, based on which we sort them. Suppose the size of the elements at distance $k$ is $X_1$, size of elements with distance 2 is $X_2$, and so on. Hence $$ S_i = \sum_{j=0}^i X_j $$

The distribution on $X_{i+1}$ given the previous observed nodes is binomial $ \text{Bin} \left( n - 1 - S_i, p \right)$; in other words the probability of observing $X_{i+1} = k$ neighbors with distance $i+1$ from the starting point is ${ n - 1 -S_i \choose k }p^{k}(1-p)^{n - 1 -S_i-k}$. With this the expected number of the newly discovered neighbors at distance $i$ is $\mathbb{E}_{X_{i+1}|S_i} \left( X_{i+1}|S_i \right) = (n - 1 - S_i)p$. Using the law of total expectations: \begin{align*} \mathbb{E}X_i &= \mathbb{E}_{S_i} \left[ \mathbb{E}_{X_i | S_i } \left( X_i | S_i \right) \right] \\ &= \mathbb{E}_{S_i} \left[ n - 1 - S_i \right] p = (n-1)p - p \mathbb{E} S_i \end{align*}

\begin{align*} \mathbb{E}{S_{i+1}} &= \mathbb{E}{S_i} + \mathbb{E}{X_{i+1}} \\ &= (n-1)p -(1-p) \mathbb{E}{S_i} \\ &= (n-1)p \sum_{j= 0}^i (1-p)^j \\ &= (n-1)p \frac{ 1 - (1-p)^{i+1} }{ 1 - (1-p) } \\ &= (n-1) \left( 1 - (1-p)^{i+1} \right) \end{align*}

Therefore $\mathbb{P}(Z_{u \rightarrow v}) = \frac{n-1}{n-1} \left[ 1 - (1-p)^{(n-1)} \right] $ and $P(\text{no path between } i \text{ and } j) = (1-p)^{(n-1)}$

The confusing part is, I am not sure why this formula does not work for the special case of $n=3$, as I calculated above.

Misha Lavrov
  • 159,700
Daniel
  • 2,760
  • The distribution of $X_{i+1}$ given the previous nodes is $\text{Bin}(n - 1 - S_i, 1 - (1-p)^{X_i})$, since a node counted by $X_{i+1}$ may be reached by any of the nodes counted by $X_i$. – Misha Lavrov Jun 13 '18 at 21:30
  • @MishaLavrov can you expand on this comment a little bit. Not sure I understand it. – Daniel Jun 15 '18 at 21:52
  • If you've found the nodes at distance $1,2,\dots,i$ from the starting node $u$, then the nodes at distance $i+1$ are those of the $n-1-S_i$ remaining nodes which are adjacent to one of the $X_i$ nodes at distance $i$. For each remaining node, the probability it is not adjacent to any of those nodes is $(1-p)^{X_i}$, so the probability that it is adjacent to one of them is $1-(1-p)^{X_i}$. – Misha Lavrov Jun 15 '18 at 23:25

3 Answers3

2

If you want the exact probability, then we can reduce this to the problem of finding the exact probability that a graph is connected. If $f(n,p)$ is that probability, then the probability that two vertices $u,v$ are disconnected is $$\sum_{i+j+k = n} \binom{n-2}{i-1,j-1,k} f(i,p) f(j,p) (1-p)^{ij + ik + jk}$$ where we just sum over all the possibilities: that the first vertex is in a component of size $i$, the second in a component of size $j$, and there are $k$ vertices left over.

But this is a very unnatural question to ask. Generally, with random graphs, we look for asymptotic answers as $n \to \infty$, because the exact answers are not very nice and not very useful.

Asymptotically, for $p \ge \frac{1+\epsilon}n$, the random graph has a giant component of linear size, and all remaining components are sublinear. So up to lower-order terms, two vertices $u,v$ are connected with about the same probability as that they're both in the giant component. If $p \sim \frac cn$, then $(x + o(1))n$ vertices are in the giant component w.h.p., where $x$ satisfies $1-x= e^{-cx}$, and the probability that $u$ and $v$ are not connected is therefore $(1-x)^2+o(1)$.

In particular, if $p \gg \frac1n$, the probability that $u$ and $v$ are not connected tends to $0$ as $n \to \infty$.

If $p \le \frac1n$, there is no giant component, and the probability that $u$ and $v$ are not connected tends to $1$ as $n \to \infty$.

Misha Lavrov
  • 159,700
1

This is a partial answer, but I will try to complete the answer over time. Well, turns out the question is not that simple.

Based on this paper you can think about it as a layer-wise exploration of all the nodes (i.e. nodes in each layer are equidistant from the starting node)

enter image description here

The paper models the distribution of the shortest path $P(d_{ij} = k)$, and provides a recursive formulation for the distribution (and its approximation).

Side note that, we know that the mean distance for ER random graphs is: $$ \langle d\rangle = \sum_i i \times P(d_{ij} = i) = \frac{\log n}{ \log c} + o(1) $$ although this is not what the question was asking for.

And turns out that the diameter, i.e. the longest possible distance among all possible shortest paths in a given network is almost the same as the average distance.

Daniel
  • 2,760
1

With the method of @Misha Lavrov, I wrote a Mathematica code to calculate the answer

p = 0.5;
f[n_] := 1 - 
   Sum[f[i]*Binomial[n - 1, i - 1]*(1 - p)^{i*(n - i)}, {i, 1, n - 1}];
Flatten@Table[f[n], {n, 1, 10}]
TheTwoCertainVerticesNotConnected[n_] := 
  Sum[Multinomial[m[[1]] - 1, m[[2]] - 1, m[[3]]]*f[m[[1]]]*
    f[m[[2]]]*(1 - p)^{m[[1]] m[[2]] + m[[2]] m[[3]] + 
       m[[3]] m[[1]]}, {m, 
    Flatten[Table[FrobeniusSolve[ConstantArray[1, 3], k], {k, n, n}], 
     1]}];
Flatten@Table[1 - TheTwoCertainVerticesNotConnected[n], {n, 2, 10}]

TheTwoCertainVerticesNotConnected[n] for n in (2..=10)

$$ \{0.5,0.625,0.75,0.853516,0.923584,0.963058,0.982573,0.991691,0.995962\} $$