Cult-like behaviour [was Re: Kindness]

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Jul 14 22:55:27 EDT 2018


On Sun, 15 Jul 2018 09:07:17 +1000, Chris Angelico wrote:

> On Sun, Jul 15, 2018 at 8:15 AM, Marko Rauhamaa <marko at pacujo.net>
> wrote:
>> Chris Angelico <rosuav at gmail.com>:
>>
>>> On Sun, Jul 15, 2018 at 5:54 AM, Marko Rauhamaa <marko at pacujo.net>
>>> wrote:
>>>> True enough. Modern-day protocols as well as Linux file formats and
>>>> commands intentionally blur the line between strings and bytes. The
>>>> software in question deals with all of the above. It is virtually
>>>> impossible to keep track of what is "really" text and what is
>>>> "really" binary.

Of course we have no idea what Marko's software is, or what it is doing, 
but frankly that seems pretty implausible to me. On the face of it, it 
seems as ridiculous as the claim that he can't tell which variables are 
quote-unquote "really" lists of weights and which are lists of distances.

On the face of things, this really sounds more like an admission that 
Marko is working with a shitty code base, not a fundamental problem with 
Python. But dealing with shitty code bases is the reality.


>>>> In the end, the Gordian Knot was sliced by using
>>>> Python3's strings for everything and restricting oneself to Latin-1
>>>> codepoints (almost) everywhere.
[...]

I wonder whether Marko's Python 2.7 code base was ever actually tested 
with non-Latin1 text. I suspect that if Marko had (let's say) Japanese 
users expecting to use CJK characters in the application, his affection 
for the 2.7 version would be a lot less.


[Marko]
>> What I'm saying is that I'm using Python3
>> strings as holders for bytes. Since every byte is a valid Unicode code
>> point, a Python3 string can hold any sequence of bytes.

[Chris] 
> Since every byte is also a valid IEEE 754 64-bit binary floating point
> value, a sequence of floats can hold any sequence of bytes, too. Is it a
> good idea to use floats to represent bytes?


3.6e-322 1.6e-322 4.8e-322 5.1e-322 5.63e-322 5e-322 5e-322 1.63e-322



> Text strings and sequences of bytes *are different*.

At an implementation level, everything is bytes. People do so insist on 
conflating implementation with interface, even when they don't need to...

(Sometimes I think people should be required to implement algorithms on 
analogue computing devices before they're allowed to write code for 
digital computers, just to drive home the point that neither bytes nor 
bits are fundamental to computing, but are mere implementation details.)

At a semantic level, byte strings and text strings represent 
fundamentally different things, as distinct as weights and lengths.

Unfortunately, due to the long influence of ASCII in computing, a lot of 
people have internalised that "byte 0x41 *really is* the letter A" when 
that's just a mere encoding convention. You wouldn't add 5kg to 5cm and 
expect to get a meaningful result, but people expect to combine bytes and 
text and "just make it work".

One might as well say that bytes b'@=<\xed\x91hr\xb0' really is the 
number 29.238 and expect to multiple your name by 12.5 and get your 
height in seconds.


[Marko]
>> Couldn't you use bytes objects everywhere for the same purpose?
>>
>> Yes and no.
>>
>> Yes, but it would be ugly as hell and would involve changing a large
>> percentage of the source code.

It would also require re-inventing the entire Unicode infrastructure 
already provided -- unless you intended to just say No to 99% of human 
languages in the world, including English, in favour of restricting 
everyone, including English speakers, to an artificial subset of the 
characters they use in real life.

(Even Latin1 doesn't cover all the English punctuation marks I expect to 
be able to use in text.)

It's not 1970 any more. Under what circumstances is that acceptable?


>> No, as a large number of Python3 facilities require str objects as
>> arguments. Consider urllib.request.urlopen(), for example, which
>> requires a URL to be an str object.

That's because URLs are fundamentally text strings.

Quick quiz: which of the following are real URLs?

(a)  http://правительство.рф

(b)  http://παράδειγμα.δοκιμή

(c)  http://실례.테스트

(d)  All of the above.

https://uxmag.com/articles/a-url-in-any-language



> Well, duh. It also doesn't accept a list of floats, just because you
> COULD represent a text string that way.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson




More information about the Python-list mailing list