Is setdefaultencoding bad?

Nobody nobody at nowhere.com
Wed Feb 23 00:40:29 EST 2011


On Tue, 22 Feb 2011 19:34:21 -0800, moerchendiser2k3 wrote:

> Hi, I embedded Py2.6.1 in my app and I use UTF-8 encoded strings
> everywhere in the interface, so the interface between my app and
> Python is UTF-8 so I can simply write:
> 
> print u"\uC042"
> print u"\uC042".encode("utf_8")
> 
> and get the corresponding chinese char in the console. But currently
> sys.defaultencoding is still ascii. Should I change it in the site.py
> and turn it to utf-8 or is this not recommended somehow? I often read
> its highly unrecommended but I can't find an explanation why.

You shouldn't use it.

If your code needs to run on any system other than your own, it can't rely
upon the default encoding being set to anything in particular. So
changing the default encoding is an easy way to end up writing code which
doesn't work on any system except your own.

And you can't change the default encoding outside of site.py because the
value has to be constant throughout the lifetime of the process.

IIRC, if you use a unicode string as a dictionary key, and the key can be
converted using the default encoding, the hash is calculated on the
encoded byte string (so that if you have equivalent unicode and byte
strings, both hash to the same value). If you were to change the default
encoding after any dictionaries have been created (internally, Python uses
dictionaries quite extensively), subsequent lookups would use the wrong
hash values.




More information about the Python-list mailing list