choosing a default text-encoding in Python programs

Ben Finney bignose+hates-spam at benfinney.id.au
Sun Feb 22 20:05:13 EST 2009


Joshua Judson Rosen <rozzin at geekspace.com> writes:

> If you have to make an assumption, I'd really think that it'd be
> better to use whatever the host OS's default is, if the host OS has
> such a thing--using an assumption of ISO 8859-1 works only in select
> regions on unix systems, and may fail even in those select regions
> on Windows, Mac OS, and other systems; without the OS
> considerations, just the regional constraints are likely to make an
> ISO-8859-1 assumption result in /incorrect/ results anywhere
> eastward of central Europe. Is a user in Russia (or China, or Japan)
> *really* most likely to be using ISO 8859-1?

The fallacy in the above is to assume that a given programmer will
only be opening files created in their current locale. I say that is a
fallacy, because programmers in fact open program files created all
over the world in different locales; and those files should, where
possible, be interpreted by Python the same everwhere.

Assuming a *single*, defined, encoding in the absence of an explicit
declaration at least makes all Python installations (of a given
version) read any program file the same in any locale.

-- 
 \           “If [a technology company] has confidence in their future |
  `\      ability to innovate, the importance they place on protecting |
_o__)     their past innovations really should decline.” —Gary Barnett |
Ben Finney



More information about the Python-list mailing list