Python 3.1.1 bytes decode with replace bug

Joe JoeSalmeri at hotmail.com
Sat Oct 24 20:47:41 EDT 2009


> For the reason BK explained, the important difference is that I ran in
> the IDLE shell, which handles screen printing of unicode better ;-)

Something still does not seem right here to me.

In the example above the bytes were decoded to 'UTF-8' with the
replace option so any characters that were not UTF-8 were replaced and
the resulting string is '\ufffdabc' as BK explained.  I understand
that the replace worked.

Now consider this:

Python 3.1.1 (r311:74483, Aug 17 2009, 16:45:59) [MSC v.1500 64 bit
(AMD64)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '\ufffdabc'
>>> print(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
position
0: character maps to <undefined>
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

This too fails for the exact same reason (and doesn't invole decode).

In the original example I decoded to UTF-8 and in this example the
default encoding is UTF-8 so why is cp437 being used?

Thanks in advance for your assistance!









More information about the Python-list mailing list