How to get the "longest possible" match with Python's RE module?

MonkeeSage MonkeeSage at gmail.com
Mon Sep 11 23:26:43 EDT 2006


Licheng Fang wrote:
> Basically, the problem is this:
>
> >>> p = re.compile("do|dolittle")
> >>> p.match("dolittle").group()
> 'do'

>From what I understand, this isn't python specific, it is the expected
behavior of that pattern in any implementation. You are using
alternation, which means "either, or", and you have the shorter
subexpression first, so the condition is satisfied by just 'do' and the
matching terminates.

> There's another example:
>
> >>> p = re.compile("one(self)?(selfsufficient)?")
> >>> p.match("oneselfsufficient").group()
> 'oneself'

Again, I don't think this has anything to do with python. You pattern
basically means "match 'one' whether it is followed by 'self' or not,
and whether it is followed by 'selfsufficient' or not". For this
particular example, you'd want something like
"one(self)?(sufficient)?".

I think you could construct a pattern that would do what you want in
python without any problem. If you post a (short) example of your data,
I'm sure someone could help you with it.

Regards,
Jordan




More information about the Python-list mailing list