Enumerating Regular Expressions

blair.bethwaite at gmail.com blair.bethwaite at gmail.com
Tue May 9 04:30:58 EDT 2006


Dale Strickland-Clark wrote:
> Any regular expression that has an asterisk in it has an infinite number of
> possible matches.
>
> If it has two asterisks, that's an infinite number squared and that's a
> really big number.
>
> You wouldn't want to print them out.

We've been over this already.  Why are people getting stuck on infinite
regular languages?  I've made it quite clear that I'm only really
interested in doing this for finite languages, but that shouldn't
matter anyway.  Maybe I should phrase it differently; I want a tool
that can enumerate a regex, with support for generating each string
described by the regex in some predefined order.  So something like:
# pattern is the regex representing language L
reg_lang_gen = re.compile('pattern').enumerator()
while len(s) < n:
   s = reg_lang_gen.next()   # get next string in L
   ...

It's easy to imagine a problem where you'd want to enumerate a regex
that represents an infinite regular language.  E.g. a particularly dumb
bruteforce password cracker, you might want to keep generating strings
from the language of choice until you find one that matches your
encrpyted version, naturally you might want to stop if you hadn't found
it within 10 characters or so.

Cheers,
-B




More information about the Python-list mailing list