[Python-Dev] Python 3.0.1 (io-in-c)

Jean-Paul Calderone exarkun at divmod.com
Wed Jan 28 20:17:46 CET 2009


On Wed, 28 Jan 2009 18:52:41 +0000, Paul Moore <p.f.moore at gmail.com> wrote:
>2009/1/28 "Martin v. Löwis" <martin at v.loewis.de>:
>> Well, first try to understand what the error *is*:
>>
>> py> unicodedata.name('\u0153')
>> 'LATIN SMALL LIGATURE OE'
>> py> unicodedata.name('£')
>> 'POUND SIGN'
>> py> ascii('£')
>> "'\\xa3'"
>> py> ascii('£'.encode('cp850').decode('cp1252'))
>> "'\\u0153'"
>>
>> So when Python reads the file, it uses cp1252. This is sensible - just
>> that the console uses cp850 doesn't change the fact that the "common"
>> encoding of files on your system is cp1252. It is an unfortunate fact
>> of Windows that the console window uses a different encoding from the
>> rest of the system (namely, the console uses the OEM code page, and
>> everything else uses the ANSI code page).
>
>Ah, I see. That is entirely obvious. The key bit of information is
>that the default io encoding is cp1252, not cp850. I know that in
>theory, I see the consequences often enough (:-)), but it isn't
>"instinctive" for me. And the simple "default encoding is system
>dependent" comment is not very helpful in terms of warning me that
>there could be an issue.

It probably didn't help that the exception raised told you that the
error was in the "charmap" codec.  This should have said "cp850"
instead.  The fact that cp850 is implemented in terms of "charmap"
isn't very interesting.  The fact that while encoding some text
using "cp850" is.

Jean-Paul


More information about the Python-Dev mailing list