Looking for a regexp generator based on a set of known string representative of a string set

Diez B. Roggisch deets at nospam.web.de
Fri Sep 8 13:22:14 EDT 2006


vbfoobar at gmail.com schrieb:
> Hello
> 
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine
> patterns, i.e. families of strings...).

For matching the exact set, of course a concatenation can be used. 
However, this is of limited use, because then simple string find will 
suffice.

In general, this can't be done. It might be possible that the underlying 
structure of the language isn't regular at all - for example, the simple 
language of palindromes isn't.

And even if it was - the search space is just to large to explore. You 
could generate a bazillion matching expressions, but .* will always 
match - so how do you prevent the generator from converging towards that?

If all you have are those strings, you are better off trying to infer 
some structure yourself.

Diez



More information about the Python-list mailing list