Cult-like behaviour [was Re: Kindness]

Marko Rauhamaa marko at pacujo.net
Sun Jul 15 07:17:51 EDT 2018


Steven D'Aprano <steve+comp.lang.python at pearwood.info>:

> On Sun, 15 Jul 2018 11:43:14 +0300, Marko Rauhamaa wrote:
>> Paul Rubin <no.email at nospam.invalid>:
>>> I don't think Go is the answer either, but it probably got strings
>>> right.  What is the answer?
>
> Go strings aren't text strings. They're byte strings. When you say that 
> Go got them right, that depends on your definition of success.
>
> If your definition of "success" is:
>
> - fail to be able to support 80% + of the world's languages
>   and a majority of the world's text;

Of course byte strings can support at least as many languages as
Python3's code point strings and at least equally well.

> - perpetuate the anti-pattern where a single code point
>   (hex value) can represent multiple characters, depending
>   on what encoding you have in mind;

That doesn't follow at all.

> - to have a language where legal variable names cannot be
>   represented as strings; [1]

That's a rather Go-specific and uninteresting question, but I'm fairly
certain you can write a Go parser in Go (if that's not how it's done
already).

> - to have a language where text strings are a second-class
>   data type, not available in the language itself, only in
>   the libraries;

Unicode code point strings *ought* to be a second--class data type. They
were a valiant idea but in the end turned out to be a mistake.

> - to have a language where text characters are *literally* 
>   32-bit integers ("rune" is an alias to int32);
>
>   (you can multiple a linefeed by a grave accent and get pi)

Again, that has barely anything to do with the topic at hand. I don't
think there's any unproblematic way to capture a true text character,
period. Python3 certainly hasn't been able to capture it.

>> That's the ten-billion-dollar question, isn't it?!
>
> No. The real ten billion dollar question is how people in 2018 can
> stick their head in the sand and take seriously the position that
> Latin-1 (let alone ASCII) is enough for text strings.

Here's the deal: text strings are irrelevant for most modern programming
needs. Most software is middleware between the human and the terminal
device. Carrying opaque octet strings from end to end is often the most
correct and least problematic thing to do.

On the other hand, Python3's code point strings mess things up for no
added value. You still can't upcase or downcase strings. You still can't
sort strings. You still can't perform random access on strings. You
still don't know how long your string is. You still don't know where you
can break a string safely. You still don't know how to normalize a
string. You still don't know if two strings are equal or not. You still
don't know how to concatenate strings.


Marko



More information about the Python-list mailing list