Disclaimer: I'm not a CS so I basically have no idea what I'm talking about
I have a large (>1000) set of directed acyclic graphs with a large (>1000) set of vertices each; the vertices are labeled.
I want to identify substructures that appear frequently over the whole set of graphs.
- A substructure is a graph of at least two vertices with specific labels. Such a substructure may appear once or more in one or more of the given input graphs. For example "a [vertex labeled A with two children labeled B] appears twice in graph U and once in graph V".
- A substructure must obey a set of pre-given rules which filter on the vertices' labels. As an example: A substructure that contains a vertex label A is interesting if the sub-graph is "a vertex labeled A that has at least one child labeled B and is not a sibling of a vertex labeled U or V". Substructures that do not conform to these rules may appear in the input graphs but are not of interest for the search.
I have tried to look into things and (as it seems to always happen with me) the problem is NP-complete. As far as I can see gSnap is the most common algorithm to solve this problem. However, as stated above, I'm not looking for any common substructure in the graphs but only those that obey certain rules. One should be able so use that in order to reduce the search space.
Any insight on how to approach this problem?
Update: I should probably add that the aforementioned rules can be recursive. For example "a vertex labeled A with at least two children labeled B, each having at least one child labeled A". Besides practicability there are no limits on recursion depth.