How to get the "longest possible" match with Python's RE module?

gatti at dsdata.it gatti at dsdata.it
Tue Sep 12 06:06:05 EDT 2006


Licheng Fang wrote:

> Another question: my task is to find in a given string the substrings
> that satisfies a particular pattern. That's why the first tool that
> came to my mind is regular expression. Parsers, however, only give a
> yes/no answer to a given string. To find all substrings with a
> particular pattern I may have to try every substring, which may be an
> impossible task.

You can collect all successful parser results beginning from each index
in the string; this gives you all matches with that first index.
You could extend to multiple results general bottom-up context-free
language parsing like Earley or Tomita's algorithms; for reasonable
languages most locations can be excluded for most rules at the
beginning, with great performance improvements over trying over and
over again.

Lorenzo Gatti




More information about the Python-list mailing list