3

Assume that we are given a real life graph, DBLP network in my case, where degree distribution of nodes follows a power law (many nodes have 1, 2 neighbors, and only a few nodes have hundreds of neighbors).

A random walk ends when it returns to the initial node or when the walk takes 3 steps. If we start random walks from each node on this graph, should we start equal number of walks from each node? If so, nodes with small degrees will often return to where they started, and we will not learn big portions of the network. This is because small degree nodes are neighbors of small degree nodes more often, so there will not be many paths to walk on.

I believe there should be a way to decide on the number of walks to minimize computational costs.

Juho
  • 22,905
  • 7
  • 63
  • 117
Cuneyt
  • 31
  • 1

1 Answers1

2

If you consider given graph as a Markov chain, you can compute the steady-state distribution. With it, you can compute the mean-reccurence time $M_j$ (expected time to return to the starting node). See Mean recurrence time

$\pi_j=1/M_j$

If for some node $n_j$, the probability of going to each neighbour is equal to $1/out\_deg(n_j)$, then you can compute $\pi_j$ in constant time.

This does not assume, you stop after three steps (why do you actually need this?). From here you could do some experiments with damping factor $d$. Damping factor $d$ is the probability you end the walk at each step. Damping factor You can combine these two aproaches to get to your estimate.

-edit-

So here goes. Given an undirected graph $G=(V,E)$ you can find the stationary distribution of the corresponding Markov chain $M$ with transition matrix $P$ as follows:

$\pi_i=deg(v_i)/|E|$

where $deg(v_i)$ is the degree of vertex $i$, $|E|$ is the number of all edges in graph $G$ and $\pi_i$ is the $i$-th component of vector that satisfies:

$\lim_{k->\infty} P^k=1\pi$

Note that $\pi$ is unique if the Markov chain is irreducible and aperiodic.

  • Irreducibility essentially means, you can get from every vertex $i$ to every vertex $j$. If the graph is not connected, you can use the same approach on the connected components of the graph.
  • Aperiodicity means that greatest common divisor of cylce lengths in the Markov chain. This means that if you have at least one paper which was written by three authors (in one component), the condition is trivially satisfied.

Mean recurrence time is the computed by the formula above.

Nejc
  • 418
  • 3
  • 9