How to get the "longest possible" match with Python's RE module?

Licheng Fang fanglicheng at gmail.com
Tue Sep 12 05:46:07 EDT 2006


gatti at dsdata.it wrote:
> kondal wrote:
>
> > This is the way the regexp works python doesn't has anything to do with
> > it. It starts parsing the data with the pattern given. It returns the
> > matched string acording the pattern and doesn't go back to find the
> > other combinations.
>
> I've recently had the same problem in Java, using automatically
> generated regular expressions to find the longest match; I failed on
> cases like matching the whole of "Abcdefg", but also the whole of
> "AbCdefg" or "ABcdefg", with ([A-Z][a-z])?([A-Z][A-Za-z]{1,10})? .
> No systematic way to deal with these corner cases was available, and
> unsystematic ways (with greedy and reluctant quantifiers) were too
> complex.
> I ended up eliminating regular expressions completely and building a
> dynamic programming parser that returns the set of all match lengths;
> it wasn't hard and it should be even easier in Python.
>
> Lorenzo Gatti

Thanks. I think make use of the expresiveness of CFG may be better
idea.

Another question: my task is to find in a given string the substrings
that satisfies a particular pattern. That's why the first tool that
came to my mind is regular expression. Parsers, however, only give a
yes/no answer to a given string. To find all substrings with a
particular pattern I may have to try every substring, which may be an
impossible task.

How can I solve this problem?




More information about the Python-list mailing list