[issue9804] ascii() does not always join surrogate pairs
Antoine Pitrou
report at bugs.python.org
Thu Sep 9 01:04:53 CEST 2010
Antoine Pitrou <pitrou at free.fr> added the comment:
How about the following solution:
>>> def a(s):
... s = s.encode('unicode-escape').decode('ascii')
... s = s.replace("'", r"\'")
... return "'" + s + "'"
...
>>> s = "'\0\"\n\r\t abcd\x85é\U00012fff\U0001D121xxx\uD800."
>>> print(ascii(s)); print(a(s)); print(repr(s))
'\'\x00"\n\r\t abcd\x85\xe9\U00012fff\ud834\udd21xxx\ud800.'
'\'\x00"\n\r\t abcd\x85\xe9\U00012fff\U0001d121xxx\ud800.'
'\'\x00"\n\r\t abcd\x85é\U00012fff𝄡xxx\ud800.'
(I think I've included everything:
- normal chars
- control chars
- one-byte non-ASCII
- two-byte non-ASCII (and lone surrogate)
- printable and non-printable surrogate pairs)
- single and double quotes)
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9804>
_______________________________________
More information about the Python-bugs-list
mailing list