6

I am relatively new in the field of persistent homology and topological data analysis. I would like to use RIPSER, DIPHA or GUDHI to calculate barcodes which will give a persistence diagram. Here are my questions:

1.) How many data points are possible to analyze with such libraries with respect to an average computer?

2.) Which possible ways to analyze and compare two different persistence diagrams are there? For example I have heard of Wasserstein distance and bottleneck distance. Is there a library or software for analyzing two such diagrams?

3.) How can I interpretate persistence diagrams with many data points?

I will get two different persistence diagrams, which should have no big differences. I would like to compare them and find the existing differences.

EDIT: Ad question 1. The article "A Roadmap for the Computation of Persistent Homology" by Otter et al. gives an idea. The maximum size of the complex $K$ in the case of GUDHI and RIPSER is $3.4\cdot 10^9$, while the size of $K$ is $2^{\mathcal{O}(N)}$, where $N$ is the cardinality of the vertex set. They used in their experiments data sets up to $N=2000$. In the case of SimBa, data sets of $N\ge 250 000$ were used after some reduction by PCA (cf. "SimBa: An Efficien Tool for Approximating Rips-filtration Persistence via Simplicial Batch-collapse" by Dey, Shi and Wang). The size of the (sparsified VR) complex $K$, used by SimBa, is given by $\mathcal{O}(N)$.

  • It depends on your computer. However, for computing PD of data points, I would recommend Ripser because it is born to do only this. It is MUCH faster than Gudhi, and perhaps also Dionysus.
  • In Python, there are Dionysus and Gudhi. In R, the most popular choice is R-TDA
  • – SiXUlm May 12 '19 at 23:19