Python 3 is killing Python

Chris Angelico rosuav at gmail.com
Wed Jul 16 12:07:28 EDT 2014


On Thu, Jul 17, 2014 at 1:48 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> it is dangerous to assume that the file formats agree with
> the locale.

Of course. You never assume anything about encodings. What you do is
expect something about the encoding, and either throw an error if it's
wrong, or figure out some other encoding to use. With anything that
you broadly control (eg if your program is configured by a file in
/etc that nothing else uses), you just decode with whatever you
document your program as using, and any failure is *not your problem*.
It's that simple. You don't replace /etc/passwd with a JPEG encoded
photograph of your family tree and expect all your family to be able
to log in; no more should you expect a file to be parsed correctly if
it's meant to be UTF-8 and you save it in ISO-8859-4. The two cases
are equally ridiculous.

The only thing that might be an issue is that you can't use open(fn)
to read your files, but you have to explicitly state the encoding.
That would be an understandable problem, especially for someone who
develops on a single platform and forgets that the default differs. As
long as you always explicitly say encoding="utf-8", and document that
you do so, any problems are someone else's.

ChrisA



More information about the Python-list mailing list