18

I have implemented a topological sort based on the Wikipedia article which I'm using for dependency resolution, but it returns a linear list. What kind of algorithm can I use to find the independent paths?

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
Masse
  • 283
  • 2
  • 5

1 Answers1

21

I assume that an edge $(u,v)$ means that $u$ has to be executed before $v$. If this is not the case, turn around all edges. I furthermore assume that you are less interested in paths (those are already given by the DAG) than in a good execution strategy given the dependencies.

You can easily adapt the topological sort procedure: instead of appending, merge all items of the same "depth" to one set. You get a list of sets, each of which contains items you can execute/install in parallel. Formally, the sets $S_i$ are defined thus for the graph $G = (V,E)$:

$\qquad \displaystyle \begin{align} S_0 &= \{ v \in V \mid \forall u \in V. (u,v) \notin E \} \\ S_{i+1} &= \{v \in V \mid \forall u \in V. (u,v) \in E \to u \in \bigcup_{k=0}^i S_k \} \end{align}$

You can then execute your tasks like this (let's assume there are $k$ sets):

for i=0 to k
  parallel foreach T in S_k
    execute T

Of course, this does not yield maximum throughput if tasks take different amounts of time: two parallel, independent linear chains sync after each element. To circumvent this, I suggest you work directly on the DAG with a parallel traversal starting in the source nodes -- those in $S_0$ -- and syncing/forking in nodes with several incoming/outgoing edges:

parallel foreach T in S_0
  recursive_execute T

where

recursive_execute T {
  atomic { if T.count++ < T.indeg then return }
  execute T
  parallel foreach T' in T.succ
    recursive_execute T'
}

and T.count is a simple counter holding the number of predecessors of T which have been executed already, T.indeg the number of predeccessors and T.succ the set of successors.

Raphael
  • 73,212
  • 30
  • 182
  • 400