[Python-Dev] Python 3.0.1 (io-in-c)

Paul Moore p.f.moore at gmail.com
Wed Jan 28 19:52:41 CET 2009


2009/1/28 "Martin v. Löwis" <martin at v.loewis.de>:
> Well, first try to understand what the error *is*:
>
> py> unicodedata.name('\u0153')
> 'LATIN SMALL LIGATURE OE'
> py> unicodedata.name('£')
> 'POUND SIGN'
> py> ascii('£')
> "'\\xa3'"
> py> ascii('£'.encode('cp850').decode('cp1252'))
> "'\\u0153'"
>
> So when Python reads the file, it uses cp1252. This is sensible - just
> that the console uses cp850 doesn't change the fact that the "common"
> encoding of files on your system is cp1252. It is an unfortunate fact
> of Windows that the console window uses a different encoding from the
> rest of the system (namely, the console uses the OEM code page, and
> everything else uses the ANSI code page).

Ah, I see. That is entirely obvious. The key bit of information is
that the default io encoding is cp1252, not cp850. I know that in
theory, I see the consequences often enough (:-)), but it isn't
"instinctive" for me. And the simple "default encoding is system
dependent" comment is not very helpful in terms of warning me that
there could be an issue.

I do think that more wording around encoding defaults would be useful
- as I said, I'll think about how best it could be made into a doc
patch. I suspect the best approach would be to have a section (maybe
in the docs for the codecs module) explaining all the details, and
then a cross-reference to that from the various places (open, io)
where default encodings are mentioned.

Paul.

>
> Furthermore, U+0153 does not exist in cp850 (i.e. the terminal doesn't
> support œ), hence the exception.
>
> Regards,
> Martin
>


More information about the Python-Dev mailing list