How to get the "longest possible" match with Python's RE module?

gatti at dsdata.it gatti at dsdata.it
Tue Sep 12 04:36:40 EDT 2006


kondal wrote:

> This is the way the regexp works python doesn't has anything to do with
> it. It starts parsing the data with the pattern given. It returns the
> matched string acording the pattern and doesn't go back to find the
> other combinations.

I've recently had the same problem in Java, using automatically
generated regular expressions to find the longest match; I failed on
cases like matching the whole of "Abcdefg", but also the whole of
"AbCdefg" or "ABcdefg", with ([A-Z][a-z])?([A-Z][A-Za-z]{1,10})? .
No systematic way to deal with these corner cases was available, and
unsystematic ways (with greedy and reluctant quantifiers) were too
complex.
I ended up eliminating regular expressions completely and building a
dynamic programming parser that returns the set of all match lengths;
it wasn't hard and it should be even easier in Python.

Lorenzo Gatti




More information about the Python-list mailing list