Python 2.2 re bug?
Travis Shirk
travis at puddy.lan.kerrgulch.net
Sun Aug 25 16:31:51 EDT 2002
> On Sat, 24 Aug 2002 23:46:10 +0200, Travis Shirk wrote:
>>
>> Using Python 1.5.2:
>> import re;
>> data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5" data1
>> = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data); print data1
>> '\377\340\323\323\344\225\377\000\000\021\377\365'
>>
>>
>> This output is exactly what I expect, but now see what happens in 2.2.1:
>> import re;
>> data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5" data1
>> = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data); print data1
>> '\\xFF\xe0\xd3\xd3\xe4\x95\xff\x00\x00\x11\\xFF\xf5'
>>
>>
Pedro Rodriguez <pedro_rodriguez at club-internet.fr> wrote:
> I had some issue about this topic and I wonder if your problem does not
> come like me from the raw string stuff. Here goes my reasoning FWIW.
> When you write something like :
> r"\x00"
> this actual means :
> ['\\', 'x', '0', '0'] (use list(r"\x00"))
> but
> "\x00"
> means
> ['\x00'] (using list("\x00"))
> By using raw string you prevent the python parser from replacing the
> proper character in the string. And the 're' module isn't supposed to do
> this kind of substitution, it has its own things to do with '\'.
> So you should probably fix your expression by - carefully - replacing :
> data1 = re.compile(r"...").sub(r"...")
> with
> data1 = re.compile("...").sub("...")
> in both 1.5.2 and 2.x version.
Okay to reclarify, 1.5.2 works for me as expected.
I need r"" in the compile and sub arguments because
both are regular expressions. If I make both a regular string I don't
get duplicated \\ characters, but the \1 in the sub argument does not
refer to group one of the compiled regex. Not that I would expect it
to.
The bottom line is that the behavior between 1.5.2 and 2.2.1 is
differerent, and unless there is a workaround 2.2.1 seems broken.
Travis
--
--
Travis Shirk <travis at pobox dot com>
More information about the Python-list
mailing list