[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

Martin v. Löwis report at bugs.python.org
Thu Jun 10 20:40:58 CEST 2010


Martin v. Löwis <martin at v.loewis.de> added the comment:

>> 7-zip encodes "à" (U+00e0) as 0x85 (1 byte), and "é" (U+00e9) as 0x82 (1 byte). I don't know this encoding.
>
> That's an old DOS code paged used in Europe: CP850

There is a good chance that they use it because it is the OEM code page 
on the system.

In any case, I think that both cp850 and cp1252 are inherently incorrect 
for tarfiles (despite these tools using them). tar is a POSIX thing, and 
these encodings have nothing to do with POSIX.

So using UTF-8 is a reasonable choice, IMO. The other reasonable choice 
would be ASCII.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8784>
_______________________________________


More information about the Python-bugs-list mailing list