[issue8784] tarfile/Windows: Don't use mbcs as the default encoding

Marc-Andre Lemburg report at bugs.python.org
Thu Jun 10 01:21:37 CEST 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Marc-Andre Lemburg wrote:
> 
> Marc-Andre Lemburg <mal at egenix.com> added the comment:
> 
> STINNER Victor wrote:
>>
>> STINNER Victor <victor.stinner at haypocalc.com> added the comment:
>>
>> I created a TAR archive with the 7-zip archiver of file with diacritics in their name (eg. "é" and "à"). Then I opened the archive with WinRAR: the file names were not displayed correctly :-/
>>
>> 7-zip encodes "à" (U+00e0) as 0x85 (1 byte), and "é" (U+00e9) as 0x82 (1 byte). I don't know this encoding.
> 
> That's an old DOS code paged used in Europe: CP850
> 
> http://en.wikipedia.org/wiki/Code_page_850

Looks like the cmd.exe on WinXP still uses it. At least on my German
WinXP it does for Python 2.3 and older. Starting with Python 2.4,
the behavior changed to use CP1252 instead:

D:\Python26>python
Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on wi
32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'àé'
u'\xe0\xe9'

D:\Python25>python
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'áé'
u'\xe1\xe9'

D:\Python24>python
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'àé'
u'\xe0\xe9'

D:\Python23>python
Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'àé'
u'\x85\x82'
>>>

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8784>
_______________________________________


More information about the Python-bugs-list mailing list