re module non-greedy matches broken

André Malo auch-ich-m at g-kein-spam.com
Tue Apr 5 02:42:16 EDT 2005


* "lothar" <lothar at ultimathule.nul> wrote:

> no - in the non-greedy regex
>   <1st-pat><not-1st-pat>*?<follow-pat>
> 
> <1st-pat>, <not-1st-pat> and <follow-pat> are arbitrarily complex patterns.

The "not" is the problem. Regex patterns are expressed positive by
definition (meaning, you can say, what you expect, but not what you
don't expect). In other words, regexps were invented to define (uh... regular)
sets, nothing more (especially you can't define "non-sets"). So the usual
way is to define the set you've called '<not-1st-pat>*?' and describe
it as regex. Modern regular expression engines (which are no longer regular
by the way ;-) allow shortcuts like negative lookahead assertions and the
like.

I want to make clear, that it isn't, that nobody _wants_ to give an advice
how to express your pattern in general. The point is, that there's no
real syntax for it. It depends on how your <1st-pat> and <follow-pat> look
like. Chances are, that's even not expressable in one regex (depends on
the complexity and kind of the set they define).
Each pattern you write is special to the particular use case.

Said that, there are some common patterns on how to write some specific
forms of regexes. I also suggest Friedl's book. Look for the
C-comment-example, where your problem (more or less) is discussed
execessively. Though I really recommend to read the book from start
to end. It's more like a story, but a good one ;-)

> with character classes and negative character classes you do not need
> non-greediness anyway.

AFAICS one needs non-greedy regexps very very rarely at all. I'm playing
with regular expressions for about ten years now and I've actually used
them, say, two or three times -- and only for quick hacks. I've never
actually _needed_ them.

nd



More information about the Python-list mailing list