2

I apologize if this is the wrong place or too trivial a question for this community. What is the best data structure to store a time-windowed streaming graph in order to compute fast statistics over all nodes in the graph, for e.g., running computation of average degree?

I believe the best way to describe this is as follows: Let $G=(V,E)$ be a sparse time-evolving network modeled as an undirected graph with $n$ nodes and $m \geq n$ edges over time (in hours) $t = t_0, t_1, \dots $. Suppose further, that at any time point $t_i$, any edges that are more than $k$ hours are removed. In addition, nodes that have no edges connecting to it are removed.

My idea (for e.g. the average degree) is as follows: keep track of an array of edge arrival times as well as a degree array of size $n$ where each element represents the total degree. At any new time point $t_i$, we would add 1 to the degree array corresponding to the two nodes with the new edge. We would then remove all edges that are older than $k$ hours (i.e. added before $t_i-k$). Any nodes that are edge-less are removed. At all time points, a running average of the degree is computed by taking the average of the degree array.

If I'm not mistaken, this algorithm would be $O(n)$ in run-time and $O(n)$ in space. Is there any better way of doing this?

The best data structure I could find from previous posts such as this one, recommend adjancency lists. Additionally, is there any advantage in using disjoint set data structures such as in the post here?

rshroff08
  • 21
  • 2

1 Answers1

2

There's an efficient streaming algorithm for computing the average degree. Note that if you have the sum of the degrees and the number of vertices, you can compute the average -- so we'll try to keep track of those two values.

Also note that if you delete or insert an edge, it is easy to update the sum of the degrees. If you delete or insert a vertex, it is easy to update the number of vertices.

Taking all of this into account, we do $O(1)$ work (amortized) per time step, and at every time step we can infer the average degree. This is as efficient as one could possibly hope for.

D.W.
  • 167,959
  • 22
  • 232
  • 500