Python 3.1.1 bytes decode with replace bug

Terry Reedy tjreedy at udel.edu
Sat Oct 24 16:20:07 EDT 2009


Joe wrote:

Please provide more information

> The Python 3.1.1 documentation has the following example:

Where? I could not find them

>>>> b'\x80abc'.decode("utf-8", "strict")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
>                     unexpected code byte
>>>> b'\x80abc'.decode("utf-8", "replace")
> '\ufffdabc'
>>>> b'\x80abc'.decode("utf-8", "ignore")
> 'abc'

> Strict and Ignore appear to work as per the documentation but replace
> does not.  Instead of replacing the values it fails:
> 
>>>> b'\x80abc'.decode('utf-8', 'replace')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
> encode
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
> position
> 1: character maps to <undefined>

Which interpreter and system? With Python 3.1 (r31:73574, Jun 26 2009, 
20:21:35) [MSC v.1500 32 bit (Intel)] on win32, IDLE, I get

 >>> b'\x80abc'.decode('utf-8', 'replace') # pasted from above
'�abc'

as per the example.

> If this a known bug with 3.1.1?

Do you do a search in the issues list at bugs.python.org?
I did and did not find anything. The discrepancy between doc (if the 
example really is from the doc) and behavior (if really 3.1) would be a 
bug, but more info is needed.

Terry Jan Reedy





More information about the Python-list mailing list