Python 3 is killing Python

Steven D'Aprano steve at pearwood.info
Wed Jul 16 22:51:56 EDT 2014


On Wed, 16 Jul 2014 19:20:14 +0300, Marko Rauhamaa wrote:

> Chris Angelico <rosuav at gmail.com>:
> 
>> The only thing that might be an issue is that you can't use open(fn) to
>> read your files, but you have to explicitly state the encoding. That
>> would be an understandable problem, especially for someone who develops
>> on a single platform and forgets that the default differs. As long as
>> you always explicitly say encoding="utf-8", and document that you do
>> so, any problems are someone else's.
> 
> Yes. I don't like open() guessing the enconding:

It doesn't *guess*. It has a sensible default encoding which, for most 
users most of the time, does the right thing. Ultimately though, the 
encoding is under your control: you can specify it if you think you know 
better.


>    The default encoding is platform dependent (whatever
>    locale.getpreferredencoding() returns)

Right. Most text files will be written using the preferred encoding, 
unless the user explicitly uses something else when writing the file. In 
that case it's the user's responsibility. Or if they've got the file from 
another system with a different encoding. But even then, the most common 
encodings are ASCII-compatible, which means that the lowest common 
denominator case (reading and writing ASCII files) will Just Work.

>From a purity stand-point, no, open() shouldn't have a default encoding, 
and the user should have to specify it. But what makes you imagine that 
the user will know the correct encoding better than Python does? The 
average coder[1] shouldn't have to care about encodings just to do 
file.write("Hello World"), and on the average computer they don't have to 
because Python sets a sensible default.


But you know what? From a purity stand-point, *even binary mode* assumes 
an encoding of sorts. How do you know that binary files on your platform 
use eight-bit bytes? Some DSPs use 9-bit bytes, and historically 
computers had as few as 6 or as many as 60 bits per byte. This is why the 
C standard requires that a byte is *at least* 8 bits.

But, having said that, the assumption that binary files are based on 8-
bit bytes is pretty safe. It would be foolish to force the majority of 
people, who don't need to care about these sorts of details, to care 
about them just to suit the one in ten-thousand who do.

Likewise with text files. Python makes sensible defaults which will suit 
most people, rather than force people to guess the wrong encoding. But 
it's only a default, you can explicitly set it if you believe the file in 
question uses a different encoding.


[...]
> In each case, it would have been better to default to bytes just like
> subprocess does.

Better for whom? You? Maybe. For the typical programmer that Python is 
designed for? Hell no.




[1] Lets be honest, there still is a bias towards English and ASCII in 
computing, and probably this will remain the case until English ceases to 
be a de facto lingua franca. Most programming languages are written for 
J. Random Hacker, not Jランダムハッカー.


-- 
Steven



More information about the Python-list mailing list