[pypy-issue] Issue #2618: incorrect "surrogatepass" encoding with pypy3.5-5.8.0 (pypy/pypy)
Cosimo Lupo
issues-reply at bitbucket.org
Wed Jul 26 13:12:46 EDT 2017
New issue 2618: incorrect "surrogatepass" encoding with pypy3.5-5.8.0
https://bitbucket.org/pypy/pypy/issues/2618/incorrect-surrogatepass-encoding-with
Cosimo Lupo:
Hello,
I'm getting different encodings between CPython 3.5.3 and pypy3.5-5.8.0 when the input string contains surrogate escapes.
When I roundtrip the string 'Carrot \ud83e\udd55' through "utf_16_be" encoding with errors="surrogatepass", in CPython I correctly get 'Carrot \U0001f955'
```
Python 3.5.3 (default, Jul 18 2017, 13:04:39)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 'Carrot \ud83e\udd55'.encode('utf_16_be', errors='surrogatepass')
b'\x00C\x00a\x00r\x00r\x00o\x00t\x00 \xd8>\xddU'
>>> 'Carrot \ud83e\udd55'.encode('utf_16_be', errors='surrogatepass').decode('utf_16_be')
'Carrot \U0001f955'
```
However, with PyPy3.5 5.8.0, same input and code, I get this:
```
Python 3.5.3 (a37ecfe5f142bc971a86d17305cc5d1d70abec64, Jul 25 2017, 16:48:07)
[PyPy 5.8.0-beta0 with GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``the future has just begun''
>>>> 'Carrot \ud83e\udd55'.encode('utf_16_be', errors='surrogatepass')
b'\x00C\x00a\x00r\x00r\x00o\x00t\x00 >\xd8U\xdd'
>>>> 'Carrot \ud83e\udd55'.encode('utf_16_be', errors='surrogatepass').decode('utf_16_be')
'Carrot 㻘嗝'
```
I'm on macOS 10.12.6, I compiled pypy3 from source, using latest GCC 7.1.0 from homebrew.
I haven't had the chance to try on Linux yet.
Thanks for your help.
More information about the pypy-issue
mailing list