[Python-Dev] deleting setdefaultencoding iin site.py is evil

"Martin v. Löwis" martin at v.loewis.de
Thu Aug 27 08:47:59 CEST 2009


>> The ability to change the default encoding is a misfeature.  There's
>> essentially no way to write correct Python code in the presence of
>> this feature.
> 
> How so? If every single piece of text in your project is encoded in a
> superset of ascii (such as utf-8), why would this be a problem?

What is "every single piece of text"? Every string occurring in source
code? or also every single string that may be read from a file, a
socket, out of a database, or from a user interface?

How can you be certain that any string is UTF-8 when doing any
reasonable IO?

> Even if you were evil/stupid and mixed encodings, surely all you'd get
> is different unicode errors or mayvbe the odd strange character during
> display?

One specific problem is dictionaries will stop working correctly if you
set the default encoding to anything but ASCII. The reason is that
with UTF-8 as the default encoding, you get

py> u"\u20ac" == u"\u20ac".encode("utf-8")
True
py> hash(u"\u20ac") == hash(u"\u20ac".encode("utf-8"))
False

So objects that compare equal will not hash equal. As a consequence, you
may have two different values for what should be the same key in a
dictionary.

> Well, flipping that giant switch has worked in production for the past 5
> years, so I'm afraid I'll respectfully disagree. I'd suspect the
> pragmatics of real world software are with that function even exists,
> and it's extremely useful when used correctly...

It has worked in your application. See my example above: it is very easy
to create applications that stop working correctly if you use
setdefaultencoding (at all - the only supported value is "latin-1",
since Unicode strings hash the same as byte strings if all characters
are in row 0).

Regards,
Martin


More information about the Python-Dev mailing list