print u"\u0432": why is this so hard? UnciodeEncodeError

"Martin v. Löwis" martin at v.loewis.de
Thu Apr 8 15:01:43 EDT 2004


Nelson Minar wrote:
> So when Python can't guess the encoding, it assumes that ASCII is the
> best it can do? Even as an American that annoys me;

In general, it uses the default encoding which, by default, is ASCII.

This has been chosen after long discussions, which discovered that any
other guess is wrong under likely circumstances (the specific
circumstances depending on what the guess is). If the guess is wrong,
you end up with moji-bake (nonsense characters), which are very hard
to track back to their source.

In the face of ambiguity, refuse the temptation to guess.

ASCII is the only guess that has no significant risk of ambiguity:
if something encodes successfully as ASCII, it would encode to the
very same byte order in nearly any other encoding.

> what do folks who
> need non-ASCII do in practice? Martin, what do you do when you write a
> Python script that prints your own name?

It depends. If I print to the terminal, I use Unicode. If I print to
XML, I use Unicode, and expect that the XML writer will pick some
encoding, using XML character references if the o-umlat cannot be
encoded. If I print to HTML, I make sure an explicit META tag has
been added to denote the document as Latin-1, or I use ö.
If I print to a log file, I explcitly use Latin-1, unless I know
that the encoding of that log file is meant to be UTF-8. And so on.

It is not that Python is making that complicated, it is complicated
by nature - until everybody switches to UTF-8, which may take another
20 years or so.

> I guess what I'd like is a way to set Python's default encoding and
> have that respected for files, terminals, etc. I'd also like some way
> to override the Unicode error mode. 'strict' is the right default, but
> I'd like the option to do 'ignore' or 'replace' globally.

Submit a patch that does that. I very much prefer to fix errors instead
of ignoring them.

Regards,
Martin




More information about the Python-list mailing list