Proof for BFS and DFS equivalence

Question

I'm trying to prove (by induction) that BFS in equivalent to DFS, in the sense that they return the same set of visited nodes, but I'm stuck in the middle of some of the cases.

Let $G$ be a directed graph and $u \in V(G)$.

We want to prove that $ BFS(G,u) = DFS(G,u)$.

$\text{BASE CASE}$

$V(G) = \{u\}$

$BFS(G, u) = \{u\} \quad DFS(G,u) = \{u\}$

$BFS(G,u) = DFS(G,u) \quad Qed.$

$\text{INDUCTIVE HYPOTHESIS}$

$BFS(G,u) = DFS(G,u) \;\;\forall G, u \in V(G)$.

$\text{INDUCTIVE STEP}$

$(i)\;G' = (V(G) \cup\{v\}, E(G))$

We know that $BFS(G',u) = BFS(G,u)$ if $u \ne v$ (and so does $DFS$), because $v$ is disjoint from the rest of the graph, and so the proof follows from the hypothesis.

But if $u = v$ ?

$(ii)\;G' = (V(G), E(G) \cup (v,w)) \quad v,w \in V(G)$

How can I use the hypothesis here?

score 3 · Accepted Answer · edited Jun 16 '20 at 10:30

There is not much hope in proving $BFS(G,u) = DFS(G,u)$ directly by mathematical induction on the number of nodes in $G$ or on the degree of $u$. The problem is that as an induction hypothesis that equality does not capture the "right" kind of information or cover the "right" cases that are useful for the induction step.

Approach one: the explicit description as reachable nodes

Instead, you can try proving separately that each side is equal to the set $R$ of reachable nodes from $u$, that is, $R=\{v\in V(G)\mid \mbox{ there is a directed path from } u \mbox{ to } v \}$. More specifically, $$R=\{u\}\cup\left\{v\in V(G)\mid \mbox{ there exist } u_0, u_1, \cdots,u_n \text{ such that }u_0=u, u_n=v, u_i\in V(G) \text{ for } 0\le i\le n \text { and }(u_i,u_{i+1})\in E(G)\text{ for } 0\le i\lt n\right\}$$

You can prove the case of $DFS$ by mathematical induction on the total number of nodes of $G$. You can prove the case of $BFS$ by mathematical induction on the distance of $v$ to $u$. Or use whatever as you see fit.

Approach two: the characterization as nodes closed under neighbourhood

A set of nodes $S$ is said to be closed under neighbourhood if for any node $n$, $S$ contains the adjacent nodes of $n$. That is, if $n\in G$ and $(n, m)\in E(G)$, then $m\in S$. Here are the critical observations on both $BFS$ and $DFS$.

Lemma 1. $DFS(G,u)$ is closed under neighbourhood.
Proof: It becomes obvious once we check what $DFS$ does when it discovers a new node.

Lemma 2. $BFS(G,u)$ is closed under neighbourhood.
Proof: It becomes obvious once we check what $BFS$ does when it pops out a node from the queue.

Lemma 1 and 2 suggest us to consider the minimal set of nodes that contains $u$ and closed under neighbourhood. Name it $C(G,u)$. It is enough to prove that both $BFS(G,u)$ and $DFS(G,u)$ are equal to $C(G,u)$. We have shown both contain $C(G,u)$. It is easy to verify "the set of nodes visited so far are contained in $C(G,u)$" is an invariant of $BFS$ on $G$ starting from $u$. The same hold for $DFS$.

score 2 · Answer 2 · answered Nov 23 '18 at 19:44

Let's try and fix your induction. I'll transform it into an induction over $\mathbb{N}$; while not necessary -- structural induction is a thing! -- it's often easier.

Attempt one: induction over the naturals

Base case: BFS and DFS visit the same set of nodes for all graphs $G = (V, E)$ with $|V| = 1$, when started on the same node $u \in V$.
Induction hypothesis: Assume BFS and DFS visit the same set of nodes for all graphs $G = (V, E)$ with $|V| = n$, when started on the same node $u \in V$.
Inductive step: Let $G = (V, E)$ and arbitrary graph with $|V| = n+1$.

Let $U \subseteq V$ the connected component with $u \in U$. There are two cases:
- $U = V$: Let $v \in V \setminus \{u\}$ arbitrary. Applying the induction hypothesis to $V' = V \setminus \{v\}$, we know that BFS and DFS visit the same set of nodes when run on the subgraph induced by $V'$.
  Problem 1: We don't know anything about $v$; we have nothing in hand to cross the border!
- $U \neq V$: From the induction hypothesis...
  Problem 2: We know nothing about $U$ or $V \setminus U$!

Attempt two: strong induction

Problem 2 is easy to fix: strengthen the induction hypothesis to cover all small graphs:

Induction hypothesis: Assume BFS and DFS visit the same set of nodes for all graphs $G = (V, E)$ with $|V| \leq n$, when started on the same node $u \in V$.

Assuming we have established that both BFS and DFS do not visit nodes not connected to $u$, the second case is simple now.

The fundamental issue

Problem 1 persists. The problem is that for either algorithm, knowing that some set of nodes is visited tells you little about "how" $v$ is reached. The two algorithms will visit it at different points in time, via different edges. And that's the problem here: graphs and therewith the behaviour of graph algorithms are not defined solely by nodes -- but we have ignored the edges completely! And how the algorithms work, for that matter.

What you would (and should!) try now is to strengthen the claim. That is a common technique; you set out to prove more than the original claim so that the inductive hypothesis is strong enough to make the step. Here, you'd have to include the edges in some clever way.

However, since DFS and BFS treat edges very differently -- the respective sets of visited nodes after $k$ steps are wildly different! -- it is indeed unlikely that an induction for your claim will work out (nicely). A more fruitful approach is to show independently that both algorithms work correctly; along observations of the form "if a node is visited, all it's neighbours will eventually be visited", inductive proofs are indeed not too hard -- you can (and have to!) use how the algorithms actually work.