Generating text from a regular expression

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Mar 31 07:58:03 EDT 2010


En Wed, 31 Mar 2010 07:49:14 -0300, Nathan Harmston  
<iwanttobeabadger at googlemail.com> escribió:

> I have a slightly complicated/medium sized regular expression and I
> want to generate all possible words that it can match (to compare
> performance of regex against an acora based matcher). Using the
> regular expression as a grammar to generate all words in its language.
> I was wondering if this possible in Python or possible using anything.
> Google doesnt seem to give any obvious answers.

I've done this some time ago.
This recipe  
http://code.activestate.com/recipes/577041-merge-multiple-potentially-infinite-sorted-inputs-/
provides an infinite merge operation, required for effectively enumerating  
a regular language (with shorter strings first, else 'a*b' would get stuck  
repeating "a"s and never yielding any "b").
The example at the end shows the basics of how to use it (by hand) in a  
very simple case.
To enumerate all strings matching '(a|bc)*' one should invoke  
closure(product(['a'],['b','c'])), which gives:

'', 'a', 'aa', 'bc',
'aaa', 'abc', 'bca',
'aaaa', 'aabc', 'abca', 'bcaa', 'bcbc',
'aaaaa', 'aaabc', 'aabca', 'abcaa', 'abcbc', 'bcaaa', 'bcabc', 'bcbca',
...

Converting the former expression into the later function calls requires a  
parser (not shown -- and I won't be able to find it until Friday)

-- 
Gabriel Genellina




More information about the Python-list mailing list