Looking for a regexp generator based on a set of known string representative of a string set
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Fri Sep 8 11:55:50 EDT 2006
vbfoobar at gmail.com:
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine patterns, i.e. families of strings...).
This may be a very simple starting point:
>>> import re
>>> strings = ["foo", "bar", "$spam"]
>>> strings2 = "|".join(re.escape(s) for s in strings)
>>> strings2
'foo|bar|\\$spam'
>>> finds = re.compile(strings2)
But I don't know how well this may work with many longer strings.
If that's not enoug we can think about more complex solutions.
Bye,
bearophile
More information about the Python-list
mailing list