10

There are several classic datasets for machine learning classification/regression tasks. The most popular are:

But does anyone know similar datasets for networks analysis / graph theory? More concrete - I'm looking for Gold standard datasets for comparing/evaluating/learning:

  1. centrality measures;
  2. network clustering algorithms.

I don't need a huge list of publicly available networks/graphs, but a couple of actually must-know datasets.

EDIT:

It's quite difficult to provide exact features for "gold standard dataset", but here are some thoughts. I think, real classic dataset should satisfy these criteria:

  • Multiple references in articles and textbooks;
  • Inclusion in well-known network analysis software packages;
  • Sufficient time of existence;
  • Usage in a number of courses on graph analysis.

Concerning my field of interest, I also need labeled classes for vertices and/or precomputed (or predefined) "authority scores" (i.e. centrality estimates). After asking this question I continued searching, and here are some suitable examples:

  • Zachary's Karate Club: introduced in 1977, cited more than 1.5k times (according to Google Scholar), vertexes have attribute Faction (which can be used for clustering).
  • Erdos Collaboration Network: unfortunately, I haven't find this network in form of data-file, but it's rather famous, and if someone will enrich network with mathematicians' specialisations data, it also could be used for testing clustering algorithms.
sobach
  • 1,139
  • 5
  • 20

3 Answers3

5

What you are looking for can be found in KONECT (the website is down as I'm writing this but it should be fixed soon!). It's almost the most comprehensive data collection for network analysis. But the question is which one is more standard to use?

Well, there is no clear answer except of Zachary's Karate Club!

If you do a literature review in Community Detection algorithms you'll see that almost all shining papers use different networks. My suggestion is going through what Andrea Lancichinetti and Santo Fortunato did for benchmarking graphs. They proposed some benchmark graph generation algorithms e.g. this one.

Hope it helps :)

Kasra Manshaei
  • 6,752
  • 1
  • 23
  • 46
4

Maybe you can check here - http://snap.stanford.edu/data/

For each data set you will also see references of the works where they have been used

Alexey Grigorev
  • 2,900
  • 1
  • 15
  • 19
1

The only thing I know about is benchmark data for Graph Databases, such as Neo4j.

You may find links similar to this one: http://istc-bigdata.org/index.php/benchmarking-graph-databases/

where you can find data to test network analysis and graph theory.

Furthermore, you could play with the API of Twitter/Facebook to collect your own data. This is also a suggestion in case you do not find the data you are looking for.

adesantos
  • 593
  • 3
  • 8