Cult-like behaviour [was Re: Kindness]

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Jul 15 10:52:46 EDT 2018


On Sun, 15 Jul 2018 14:17:51 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> 
>> On Sun, 15 Jul 2018 11:43:14 +0300, Marko Rauhamaa wrote:
>>> Paul Rubin <no.email at nospam.invalid>:
>>>> I don't think Go is the answer either, but it probably got strings
>>>> right.  What is the answer?
>>
>> Go strings aren't text strings. They're byte strings. When you say that
>> Go got them right, that depends on your definition of success.
>>
>> If your definition of "success" is:
>>
>> - fail to be able to support 80% + of the world's languages
>>   and a majority of the world's text;
> 
> Of course byte strings can support at least as many languages as
> Python3's code point strings and at least equally well.

You cannot possibly be serious.

There are 256 possible byte values. China alone has over 10,000 different 
characters. You can't represent 10,000+ characters using only 256 
distinct code points.

You can't even represent the world's languages using 16-bit word-strings 
instead of byte strings.

Watching somebody argue that byte strings are "equally as good" as a 
dedicated Unicode string type in 2018 is like seeing people argue in the 
late 1990s that this new-fangled "structured code" will never be better 
than unstructured code with GOTO.


>> - perpetuate the anti-pattern where a single code point
>>   (hex value) can represent multiple characters, depending on what
>>   encoding you have in mind;
> 
> That doesn't follow at all.

Of course it does. You talked about using Latin-1. What's so special 
about Latin-1? Ask your Greek customers how useful that is to them, and 
explain why they can't use ISO-8859-7 instead.


>> - to have a language where legal variable names cannot be
>>   represented as strings; [1]
> 
> That's a rather Go-specific 

We were talking about whether or not Go had done strings right.

> and uninteresting question, 

It's not a question, its a statement. And it might be uninteresting to 
you, but I find it astonishing.

> but I'm fairly certain you can write a Go parser in Go

So what? You can write a Go parser in Floop if you like.

https://en.wikipedia.org/wiki/BlooP_and_FlooP


> (if that's not how it's done already).
> 
>> - to have a language where text strings are a second-class
>>   data type, not available in the language itself, only in the
>>   libraries;
> 
> Unicode code point strings *ought* to be a second--class data type. They
> were a valiant idea but in the end turned out to be a mistake.

Just because you say they were a mistake, doesn't make it so.


>> - to have a language where text characters are *literally*
>>   32-bit integers ("rune" is an alias to int32);
>>
>>   (you can multiple a linefeed by a grave accent and get pi)
> 
> Again, that has barely anything to do with the topic at hand.

It has *everything* to do with the topic at hand: did Go get strings 
right?


> I don't
> think there's any unproblematic way to capture a true text character,
> period. Python3 certainly hasn't been able to capture it.

Isaac Asimov's quote here is appropriate:

    When people thought the Earth was flat, they were wrong. 
    When people thought the Earth was spherical, they were 
    wrong. But if you think that thinking the Earth is 
    spherical is just as wrong as thinking the Earth is flat,
    then your view is wronger than both of them put together.


Unicode does not perfectly capture the human concept of "text 
characters" (and no consistent system ever will, because the human 
concept of a character is not consistent). But if you think that makes 
byte-strings *better* than Unicode text strings at representing text, 
then you are wronger than wrong.

 
>>> That's the ten-billion-dollar question, isn't it?!
>>
>> No. The real ten billion dollar question is how people in 2018 can
>> stick their head in the sand and take seriously the position that
>> Latin-1 (let alone ASCII) is enough for text strings.
> 
> Here's the deal: text strings are irrelevant for most modern programming
> needs. Most software is middleware between the human and the terminal
> device.

Your view is completely, utterly inside out. The terminal is the middle 
layer, between the software and the human, not the software.


> Carrying opaque octet strings from end to end is often the most
> correct and least problematic thing to do.

> On the other hand, Python3's code point strings mess things up for no
> added value. You still can't upcase or downcase strings.

Ah, the ol' "argument by counter-factual assertions". State something 
that isn't true, and claim it is true.

py> "αγω".upper()
'ΑΓΩ'

Looks like uppercasing to me. What does it look like to you? Taking a 
square root?

(I can't believe I need to actually demonstrate this.)




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson




More information about the Python-list mailing list