regex module, or don't work as expected

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Tue Jul 4 11:03:16 EDT 2006


In <44aa670d$0$7872$6e1ede2f at read.cnntp.org>, Fabian Holler wrote:

> Howdy,
> 
> 
> i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"
> 
> If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
> the text should be selected.
> But ?=(iface) is ignored, it is always the whole texte selected.
> What is wrong?

The ``+`` after the character class means at least one of the characters
in the class or more.  If you have a text like:

  iface lox iface

Then the it matches the space and the word ``iface`` because the space
(``\s``) and word characters (``\w``) are part of the character class and
``+`` is "greedy".  It consumes as many characters as possible and the
rest of the regex is only evaluated when there are no matches anymore.

If you want to match non-greedy then put a ``?`` after the ``+``::

  iface lo[\w\t\n\s]+?(?=(iface)|$)

Now only "iface lox " is matched in the example above.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list