I'm looking for an algorithm to construct a grammar which, given a set of words which can have multiple identical symbols, represents a compressed version of this set, that is, I can generate only the words of the set but the grammar will take less memory than the set himself.
Besides, I'm looking for an algorithm which can update the grammar when I want to remove a word of the set.
What type of algorithm is able to do that ?
I give a concrete example:
Consider a string S="abcdefghij", and then consider the finite set of words "cdhij", acdef", "fghi", "bcfgij", "defi".
I would like to construct a grammar which generates only this set of words (words which can be viewed as concatenation of various substrings of any length from the original string S).
Finally I would like to remove a word in the set and update subsequently the grammar.
Thank you.