Q: Python regular expression \Z delimiter

Milan.Gardian at leibinger.com Milan.Gardian at leibinger.com
Tue Feb 6 05:20:58 EST 2001


Hello,

I have the following problem with Python's re module:

When using multiline mode (re.MULTILINE alias re.M) together with
single-line mode (re.DOTALL alias re.S), the regular expression should
evaluate metacharacter \Z as the end of string regardless of any
embedded newlines in the processed string (unlike $).

Unfortunately it does not work for me this way... Please consider those
examples:
---
#Perl: r1.pl
$txt = "Hello\nWorld\n";
$txt =~ /(.*?\Z)/ms;
print "$1\n";
---
#Python: r1.py
import re;
txt = "Hello\nWorld\n";
reg = re.compile(r'(.*?\Z)', re.M | re.S);
res = reg.search(txt);
print res.group(1);
---
They should both produce the same result because they use the same
regular expression with the same modifiers (ms). Perl behaves as
expected (matches the string until the end):
    C:\Temp>r1.pl
    Hello
    World

    C:\Temp>

On the other hand, Python behaves differently (matches the string only
until the first line-delimiter):
    C:\Temp>r1.py
    Hello

    C:\Temp>

The \Z metacharacter evidently does not match at the end of string (as
it should), but at the end of line (i.e. behaves exactly like '$'
should and does in multiline mode). When using '$' delimiter instead,
both scripts behave alike:
---
#Perl: r2.pl
$txt = "Hello\nWorld\n";
$txt =~ /(.*?$)/ms;
print "$1\n";
---
#Python: r2.py
import re;
txt = "Hello\nWorld\n";
reg = re.compile(r'(.*?$)', re.M | re.S);
res = reg.search(txt);
print res.group(1);
---
    C:\Temp>r2.pl
    Hello

    C:\Temp>r2.py
    Hello

Could anybody explain this behavior to me please? Perhaps I do
something wrong, but currently it seems to me this behavior is a bug in
Python's implementation of re.

Regards,
	Milan G.


Sent via Deja.com
http://www.deja.com/



More information about the Python-list mailing list