Generating all permutations from a regexp

Nick Craig-Wood nick at craig-wood.com
Fri Dec 22 13:30:05 EST 2006


BJörn Lindqvist <bjourne at gmail.com> wrote:
>  With regexps you can search for strings matching it. For example,
>  given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
>  do the reverse, from a regexp generate all strings that could match
>  it.
> 
>  The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
>  "AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
> 
>  Is this possible to do? Obviously, for some regexps the set of matches
>  is unbounded (a list of everything that matches "*" would be very
>  unpractical), but how would you do it for simple regexps like the one
>  above?

A regular expression matcher uses a state machine to match strings.

What you want to do is do a traversal through that state machine
generating all possible matches at each point.

Luckily the python regexp matcher is written in python, so you have
access to the state machine directly if you want.

Take a look at sre*.py in the python library and you might be able to
work out what to do!  I had a brief look myself, and it
looked... complicated!

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list