[issue17906] JSON should accept lone surrogates

Serhiy Storchaka report at bugs.python.org
Sun May 5 13:45:06 CEST 2013


Serhiy Storchaka added the comment:

After investigating the problem deeper, I see that new parameter is not needed. RFC 4627 does not make exceptions for the range 0xD800-0xDFFF, and the decoder must accept lone surrogates, both escaped and unescaped. Non-BMP characters may be represented as escaped surrogate pair, so escaped surrogate pair may be decoded as non-BMP character, while unescaped surrogate pair shouldn't.

Here is a patch, with which JSON decoder accepts encoded lone surrogates. Also fixed a bug when Python implementation decodes "\\ud834\\u0079x" as "\U0001d179".

----------
keywords: +patch
stage: needs patch -> patch review
title: Add a string error handler to JSON encoder/decoder -> JSON should accept lone surrogates
type: enhancement -> behavior
versions: +Python 2.7, Python 3.3
Added file: http://bugs.python.org/file30130/json_decode_lone_surrogates.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17906>
_______________________________________


More information about the Python-bugs-list mailing list