Python 3 is killing Python

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Jul 16 08:10:16 EDT 2014


On Wed, 16 Jul 2014 13:46:45 +0300, Marko Rauhamaa wrote:

> Python 3 really is on a mission to elevate text into the mainstream at
> the expense of bytes. I'm guessing this is done primarily to promote the
> cross-platform transparency of Python code.

Ahead of bytes? Possibly. At the expense of bytes? Certainly not. If 
there is anything that you cannot conveniently do with bytes, that you 
could do in Python 2, it's likely a bug, or at least an obviously missing 
feature. The core devs recognise that they missed some use-cases (e.g. 
mixed bytes and text) which is now harder than it should be, and are on a 
mission to rectify that as much as possible within the constraints of 
backward compatibility.

E.g. having b"abc"[0] return 97 instead of b"a" was probably a mistake, 
but there are four versions of Python 3.x that do it that way and it's 
too late to change until Python 5000. (Python 4 is unlikely to break 
backwards compatibility in a big way.)


> For me, a linux system and network programmer, that layer of frosting
> only gets in my way and I need to wash it off.

Linux, like all Unixes, is primarily a text-based platform. With a few 
exceptions, /etc is filled with text files, not binary files, and half 
the executables on the system are text (Python, Perl, bash, sh, awk, 
etc.). 

www.catb.org/esr/writings/taoup/html/ch05s01.html

To say that *dealing with text* gets in your way on a Linux system is 
rather like saying that you love Mac OS X except for its gosh-awful GUI 
and APIs.

Of course, as a network programmer, you have to deal with bytes, so I'll 
give you a bit of leeway.


>> Most programming languages I know of default to opening files in text
>> mode, not binary mode, and I don't see any strong reason for Python to
>> go against the tide there.
> 
> In unix and linux, there never was a separate text mode for files. When
> you open a file, you open a file -- and stuff bytes in it. There is no
> commonly accepted text file encoding. UTF-8 comes close to being a
> standard, but I know somebody who sticks to an ISO-8859-1 locale.

And they should be dragged out into the street and beaten with a Clue 
Stick. They're the sort of people who are holding us back from the 
shining utopia of UTF-8 everywhere!

(only half joking)

But seriously, I cannot imagine any *rational* reason for using a legacy 
encoding, but I'm willing to give this person the benefit of the doubt 
that he's not a raving lunatic or old West European-centric curmudgeon 
trying to deny the existence of the rest of the world.

http://i.imgur.com/UeZan.jpg

That being the case, then good luck to him. As far as everyone else:

http://www.utf8everywhere.org/


>> Having len('λ') == 1 is not an advanced text processing feature.
> 
> There are (relative rare) occasions where you'd like to treat text as
> text.

o_O

Relatively rare. Like, um, email, news, html, Unix config files, Windows 
ini files, source code in just about every language ever, SMSes, XML, 
JSON, YAML, instant messenger apps, word processors... even *graphic* 
applications invariably have a text tool. Now, it may be true that some 
of those things may not use text under the hood, but even so, text is 
ubiquitous.

Even binary protocols often include chunks of recognisable human-readable 
text in them:

[steve at ando Pictures]$ hexdump -n 64 -C picture.jpg
00000000  ff d8 ff e0 00 10 4a 46  49 46 00 01 01 00 00 01  |......JFIF......|
00000010  00 01 00 00 ff e2 0f 38  49 43 43 5f 50 52 4f 46  |.......8ICC_PROF|
00000020  49 4c 45 00 01 01 00 00  0f 28 61 70 70 6c 02 10  |ILE......(appl..|
00000030  00 00 6d 6e 74 72 52 47  42 20 58 59 5a 20 07 de  |..mntrRGB XYZ ..|
00000040


> Then, it's nice to be able to move the data on the operating table
> with .decode() and when the patient has been sewn back together, you can
> release them with .encode().
> 
> More often, len(b'λ') is what I want.

Oh really? Are you sure? What exactly is b'λ'?

I couldn't have made up a better example of the confusion between bytes 
and text if I had tried. Thank you.



-- 
Steven



More information about the Python-list mailing list