6

I have a file containing a subset of possible strings from a context free language. I am looking for a mechanism to induce the grammar from this information. Is that possible?

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184

1 Answers1

5

If you have only positive examples (strings that are in the language), in principle, no, you cannot infer the grammar. For all you can tell, the language might be $\Sigma^*$: no matter how many example strings you have, you'll never be able to rule that out as the language.

However, if you have a little more information, there are known solutions. You can take a look at Angluin's algorithm. If you need a software implementation, take a look at LearnLib. See also Wikipedia's page on grammar induction for other approaches.

Another approach is to try to find the smallest context-free grammar that can generate every string in your set of examples, and nothing else. Finding the absolute smallest CFG is probably hard, but there are known heuristics that tend to give a good solution in practice. I'd recommend the Sequitur algorithm, if you need something in practice and have only positive examples. This gives up generalization -- there might be an even simpler/smaller CFG that can generate every one of your example strings as well as some others -- but for some applications it works reasonably well in practice.

D.W.
  • 167,959
  • 22
  • 232
  • 500