Python 2.2 re bug?

Pedro Rodriguez pedro_rodriguez at club-internet.fr
Sun Aug 25 06:51:53 EDT 2002


On Sat, 24 Aug 2002 23:46:10 +0200, Travis Shirk wrote:


> Hi,
> 
> I'm running into what looks to be a bug in the python 2.2 re module.
> These examples should demonstrate the problem.
> 
> Using Python 1.5.2:
> import re;
> data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5" data1
> = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data); print data1
> '\377\340\323\323\344\225\377\000\000\021\377\365'
> 
> 
> This output is exactly what I expect, but now see what happens in 2.2.1:
> import re;
> data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5" data1
> = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data); print data1
> '\\xFF\xe0\xd3\xd3\xe4\x95\xff\x00\x00\x11\\xFF\xf5'
> 
> 

I had some issue about this topic and I wonder if your problem does not
come like me from the raw string stuff. Here goes my reasoning FWIW.

When you write something like : 
    r"\x00"
this actual means : 
    ['\\', 'x', '0', '0'] (use list(r"\x00"))
but 
    "\x00" 
means 
    ['\x00'] (using list("\x00"))

By using raw string you prevent the python parser from replacing the
proper character in the string. And the 're' module isn't supposed to do
this kind of substitution, it has its own things to do with '\'.

So you should probably fix your expression by - carefully - replacing :
    data1 = re.compile(r"...").sub(r"...")
with
    data1 = re.compile("...").sub("...")
in both 1.5.2 and 2.x version.

HTH,
Pedro




More information about the Python-list mailing list