How to get the "longest possible" match with Python's RE module?

Bryan Olson fakeaddress at nowhere.org
Wed Sep 13 00:26:01 EDT 2006


Licheng Fang wrote:
> Basically, the problem is this:
> 
>>>> p = re.compile("do|dolittle")
>>>> p.match("dolittle").group()
> 'do'
> 
> Python's NFA regexp engine trys only the first option, and happily
> rests on that. There's another example:
> 
>>>> p = re.compile("one(self)?(selfsufficient)?")
>>>> p.match("oneselfsufficient").group()
> 'oneself'
> 
> The Python regular expression engine doesn't exaust all the
> possibilities, but in my application I hope to get the longest possible
> match, starting from a given point.
> 
> Is there a way to do this in Python?


A reply from Tim Peters reminded me that I once programmed a
simple special case of this problem. In particular, given a
lot of strings, my function returns a regexp that efficiently
matches the longest of prefix equal to any of the strings.
If the application is like the examples, where the target RE
is just the or of strings, the method should work well.

 
http://groups.google.com/group/comp.lang.python/msg/a22eb274845653e9


Now I'm thinking of a more general version, that handles all
true regular expressions.


-- 
--Bryan



More information about the Python-list mailing list