Unicode [was Re: Cult-like behaviour]

Jim Lee jlee54 at gmail.com
Mon Jul 16 14:43:27 EDT 2018



On 07/16/18 11:31, Steven D'Aprano wrote:
> On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote:
>
>> Had you actually read my words with *intent* rather than *reaction*, you
>> would notice that I suggested the *option* of turning off Unicode.
> Yes, I know what you wrote, and I read it with intent.
>
> Jim, you seem to be labouring under the misapprehension that anytime
> somebody spots a flaw in your argument, or an unpleasant implication of
> your words, it can only be because they must not have read your words
> carefully. Believe me, that is not the case.
>
> YOU are the one who raised the specter of politically correct groupthink,
> not me. That's dog-whistle politics. But okay, let's move on from that.
>
> You say that all you want is a switch to turn off Unicode (and replace it
> with what? Kanji strings? Cyrillic? Shift_JS? no of course not, I'm being
> absurd -- replace it with ASCII, what else could any right-thinking
> person want, right?). Let's look at this from a purely technical
> perspective:
>
> Python already has two string data types, bytes and text. You want
> something that is almost functionally identical to bytes, but to call it
> text, presumably because you don't want to have to prefix your strings
> with a b"" (that was also Marko's objection to byte strings).
>
> Let's say we do it. Now we have three string implementations that need to
> be added, documented, tested, maintained, instead of two.
>
> (Are you volunteering to do this work?)
>
> Now we need to double the testing: every library needs to be tested
> twice, once with the "Unicode text" switch on, once with it off, to
> ensure that features behave as expected in the appropriate mode.
>
> Is this switch a build-time option, so that we have interpreters built
> with support for Unicode and interpreters built without it? We've been
> there: it's a horribly bad idea. We used to have Python builds with
> threading support, and others without threading support. We used to have
> Python builds with "wide Unicode" and others with "narrow Unicode".
> Nothing good comes of this design.
>
> Or perhaps the switch is a runtime global option?
>
> Surely you can imagine the opportunities for bugs, both obvious crashing
> bugs and non-obvious silent failure bugs, that will occur when users run
> libraries intended for one mode under the other mode. Not every library
> is going to be fully tested under both modes.
>
> Perhaps it is a compile-time option that only affects the current module,
> like the __future__ imports. That's a bit more promising, it might even
> use the __future__ infrastructure -- but then you have the problem of
> interaction between modules that have this switch enabled and those that
> have it disabled.
>
> More complexity, more cruft, more bugs.
>
> It's not clear that your switch gives us *any* advantage at all, except
> the warm fuzzy feelings that no dirty foreign characters might creep into
> our pure ASCII strings. Hmm, okay, but frankly apart from when I copy and
> paste code from the internet and it ends up bringing in en-dashes and
> curly quotes instead of hyphens and type-writer quotes, that never
> happens to me by accident, and I'm having a lot of trouble seeing how it
> could.
>
> If you want ASCII byte strings, you have them right now -- you just have
> to use the b"" string syntax.
>
> If you want ASCII strings without the b prefix, you have them right now.
> Just use only ASCII characters in your strings.
>
> I'm simply not seeing the advantage of:
>
>      from __future__ import no_unicode
>      print("Hello World!")  # stand in for any string handling on ASCII
>
> over
>
>      print("Hello World!")
>
> which works just as well if you control the data you are working with and
> know that it is pure ASCII.
>
>

Had you spoken this way from the start instead of ridiculing and name 
calling, perhaps we could have reached an agreement.

However, the point is moot, as I have unsubscribed from the list. The 
conversations here (especially yours) are too condescending to waste 
more time with.





More information about the Python-list mailing list