[issue9804] ascii() does not always join surrogate pairs

Antoine Pitrou report at bugs.python.org
Thu Sep 9 01:04:53 CEST 2010


Antoine Pitrou <pitrou at free.fr> added the comment:

How about the following solution:

>>> def a(s):
...    s = s.encode('unicode-escape').decode('ascii')
...    s = s.replace("'", r"\'")
...    return "'" + s + "'"
... 
>>> s = "'\0\"\n\r\t abcd\x85é\U00012fff\U0001D121xxx\uD800."
>>> print(ascii(s)); print(a(s)); print(repr(s))
'\'\x00"\n\r\t abcd\x85\xe9\U00012fff\ud834\udd21xxx\ud800.'
'\'\x00"\n\r\t abcd\x85\xe9\U00012fff\U0001d121xxx\ud800.'
'\'\x00"\n\r\t abcd\x85é\U00012fff𝄡xxx\ud800.'


(I think I've included everything:
- normal chars
- control chars
- one-byte non-ASCII
- two-byte non-ASCII (and lone surrogate)
- printable and non-printable surrogate pairs)
- single and double quotes)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9804>
_______________________________________


More information about the Python-bugs-list mailing list