[issue29990] Range checking in GB18030 decoder

Xiang Zhang report at bugs.python.org
Wed Apr 5 23:23:12 EDT 2017


Xiang Zhang added the comment:

The table in wikipedia is somewhat complex. I find ftp://ftp.software.ibm.com/software/globalization/documents/gb18030m.pdf and the table in it is same as https://pan.baidu.com/share/link?shareid=2606985291&uk=3341026630 (except 0x80) but in English. I agree with Ma Lin bytes sequences like b'\x81\x30\xFF\x30' are invalid.

For current implementation, you could see:

>>> invalid = b'\x81\x30\xff\x30'
>>> invalid.decode('gb18030').encode('gb18030') == invalid
False

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue29990>
_______________________________________


More information about the Python-bugs-list mailing list