[PyPy-issue] [issue531] string_escape codec problems
Leonardo Santagada
pypy-dev-issue at codespeak.net
Tue May 11 18:41:33 CEST 2010
New submission from Leonardo Santagada <santagada at gmail.com>:
I've noticed some problems with string_escape codec. for example:
>>>> '\\0f'.decode('string_escape')
Traceback (most recent call last):
File "<console>", line 1, in <module>
ValueError: invalid literal for int(): 0f
>>>> '\\9'.decode('string_escape')
Traceback (most recent call last):
File "<console>", line 1, in <module>
ValueError: invalid literal for int(): 9
so I tried to fix it, but I don't think I found the right file that implements it, here is my unfinished patch:
Index: pypy/module/_codecs/test/test_codecs.py
================================================================
===
--- pypy/module/_codecs/test/test_codecs.py (revision 74383)
+++ pypy/module/_codecs/test/test_codecs.py (working copy)
@@ -273,7 +273,6 @@
assert u"\u0663".encode("raw-unicode-escape") == "\u0663"
def test_escape_decode(self):
-
test = 'a\n\\b\x00c\td\u2045'.encode('string_escape')
assert test.decode('string_escape') =='a\n\\b\x00c\td\u2045'
assert '\\077'.decode('string_escape') == '?'
@@ -281,7 +280,19 @@
assert '\\253'.decode('string_escape') == chr(0253)
assert '\\312'.decode('string_escape') == chr(0312)
+ def test_escape_decode_wrap_around(self):
+ assert '\\400'.decode('string_escape') == chr(0)
+ def test_escape_decode_ignore_invalid(self):
+ assert '\\9'.decode('string_escape') == '\\9'
+ assert '\\01'.decode('string_escape') == chr(01)
+ assert '\\0f'.decode('string_escape') == chr(0) + 'f'
+ assert '\\08'.decode('string_escape') == chr(0) + '8'
+
+ def test_escape_decode_x(self):
+ #XXX Finish tests
+ pass
+
def test_decode_utf8_different_case(self):
constant = u"a"
assert constant.encode("utf-8") == constant.encode("UTF-8")
Index: pypy/module/_codecs/app_codecs.py
================================================================
===
--- pypy/module/_codecs/app_codecs.py (revision 74383)
+++ pypy/module/_codecs/app_codecs.py (working copy)
@@ -183,16 +183,23 @@
res += '\a'
elif data[i] == 'v':
res += '\v'
- elif '0' <= data[i] <= '9':
+ elif '0' <= data[i] <= '7':
+ val = int(data[i], 8)
+ if i + 1 < l and ('0' <= data[i + 1] <= '7'):
+ i += 1
+ val = (val << 3) + int(data[i], 8)
+ if i + 1 < l and ('0' <= data[i + 1] <= '7'):
+ i += 1
+ val = (val << 3) + int(data[i], 8)
# emulate a strange wrap-around behavior of CPython:
# \400 is the same as \000 because 0400 == 256
- octal = data[i:i+3]
- res += chr(int(octal, 8) & 0xFF)
- i += 2
+ res += chr(val & 0xFF)
elif data[i] == 'x':
hexa = data[i+1:i+3]
res += chr(int(hexa, 16))
i += 2
+ else:
+ res += '\\' + data[i]
else:
res += data[i]
i += 1
----------
effort: ???
messages: 1721
nosy: pypy-issue, santagada
priority: bug
release: ???
status: unread
title: string_escape codec problems
_______________________________________________________
PyPy development tracker <pypy-dev-issue at codespeak.net>
<https://codespeak.net/issue/pypy-dev/issue531>
_______________________________________________________
More information about the Pypy-issue
mailing list