Looking for a regexp generator based on a set of known string representative of a string set
Diez B. Roggisch
deets at nospam.web.de
Fri Sep 8 13:22:14 EDT 2006
vbfoobar at gmail.com schrieb:
> Hello
>
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine
> patterns, i.e. families of strings...).
For matching the exact set, of course a concatenation can be used.
However, this is of limited use, because then simple string find will
suffice.
In general, this can't be done. It might be possible that the underlying
structure of the language isn't regular at all - for example, the simple
language of palindromes isn't.
And even if it was - the search space is just to large to explore. You
could generate a bazillion matching expressions, but .* will always
match - so how do you prevent the generator from converging towards that?
If all you have are those strings, you are better off trying to infer
some structure yourself.
Diez
More information about the Python-list
mailing list