Cult-like behaviour [was Re: Kindness]

Sun Jul 15 07:04:46 EDT 2018

On Sun, 15 Jul 2018 11:39:40 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> 
>> Of course we have no idea what Marko's software is, or what it is
>> doing,
> 
> Correct, you don't, but the link Paul Rubin posted gives you an idea:
> 
>    Python 3 says: everything is Unicode (by default, except in certain
>    situations, and except if we send you crazy reencoded data, and even
>    then it's sometimes still unicode, albeit wrong unicode).

I have a lot of respect for Armin Ronacher, but I think here he is badly 
wrong and he's just ranting.

It is ludicrous to say "everything" is Unicode when Python provides a 
rich set of bytes APIs. He squeezes in a parenthesised "by default" 
there, but that undermines his rant. That's like saying that "everything 
in Python is an int" rather than a float, because is you don't include a 
decimal point or an exponent in numeric literals, you get ints. Or that 
"files in Python are always read-only" because the default for open() is 
to use read mode rather than write mode.

>    Filenames
>    are Unicode, Terminals are Unicode, stdin and out are Unicode,

And indeed they are, in Windows, and so they should be, in Unix too. 
Maybe some day POSIX will recognise that the rest of the world exists and 
stop privileging ASCII.

>    there
>    is so much Unicode! And because UNIX is not Unicode, Python 3 now has
>    the stance that it's right and UNIX is wrong

Armin seems to be implying that Unix is (1) the only OS in the world, and 
(2) beyond criticism. Neither of these are correct. Windows users might 
rightly ask why Armin cares what Unix does.

Unix does a lot right, but not everything

http://web.mit.edu/~simsong/www/ugh.pdf

and its "everything is bytes" stance is badly wrong when it comes to user-
visible textual elements like file names and the command prompt. We write 
`less README`, not `6c7320524541444d45`, and we should stop pretending 
that we're using bytes just because the underlying infrastructure uses 
bytes. We're using text.

>> That's because URLs are fundamentally text strings.
> 
> <URL: https://tools.ietf.org/html/rfc1738>:

Irrelevant or obsolete or both.

> A URL consists of ASCII-only characters that represent an octet string.

Wrong.

>> Quick quiz: which of the following are real URLs? (a) 
>> http://правительство.рф
> 
> On the face of it, that is not a valid URL.

If you had read the link I gave, or even if you copied and pasted the URL 
into any reasonably modern browser, you might have learned that it is a 
valid URL.

> But try this:
[snip]

Indeed. Is there a reason why these shouldn't be considered serious bugs 
in the http library?

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson