[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

Thu Jun 20 14:24:03 EDT 2019

Karthikeyan Singaravelan <tir.karthi at gmail.com> added the comment:

This change seems to have caused test failure reported in https://github.com/python-hyper/wsproto/issues/126

from codecs import getincrementaldecoder
decoder = getincrementaldecoder("utf-8")()
print(decoder.decode(b'f\xf1\xf6rd', False))

# With this commit 7a465cb5ee

➜  cpython git:(7a465cb5ee) ./python.exe /tmp/foo.py
f

Before 7a465cb5ee

➜  cpython git:(38f4e468d4) ./python.exe /tmp/foo.py
Traceback (most recent call last):
  File "/tmp/foo.py", line 3, in <module>
    print(decoder.decode(b'f\xf1\xf6rd', False))
  File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 1: invalid continuation byte

----------
nosy: +xtreak

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue24214>
_______________________________________