Looking for a regexp generator based on a set of known string representative of a string set

James Stroud jstroud at mbi.ucla.edu
Fri Sep 8 22:55:07 EDT 2006


vbfoobar at gmail.com wrote:
> Hello
> 
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine
> patterns, i.e. families of strings...).
> 
> Thanks for any idea
> 

I'm not sure your application, but Genomicists and Proteomicists have 
found that Hidden Markov Models can be very powerful for developing 
pattern models. Perhaps have a look at "Biological Sequence Analysis" by 
Durbin et al.

Also, a very cool regex based algorithm was developed at IBM:

    http://cbcsrv.watson.ibm.com/Tspd.html

But I think HMMs are the way to go. Check out HMMER at WUSTL by Sean 
Eddy and colleagues:

     http://hmmer.janelia.org/

     http://selab.janelia.org/people/eddys/

James

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/



More information about the Python-list mailing list