Cult-like behaviour [was Re: Kindness]

Chris Angelico rosuav at gmail.com
Sun Jul 15 19:24:08 EDT 2018


On Mon, Jul 16, 2018 at 9:10 AM, Jim Lee <jlee54 at gmail.com> wrote:
>
>
> On 07/15/18 16:04, Chris Angelico wrote:
>>
>>
>> You claimed that Unicode was insignificant to many programs. I'm
>> trying to say that a Unicode text string is a vital part of any
>> program that works with text, which is pretty much anything that talks
>> to humans. You keep saying that ... well you keep saying different
>> things, and I've lost track of what your point actually is, but you
>> want a way to... disable Unicode? Or something? And you have yet to
>> give any example of a program that doesn't need Unicode, but still
>> uses text.
>>
>> ChrisA
>
>
> Why does this seem so obtuse to you?
>
> Have you never heard of programming BEFORE Unicode existed?
>
> How ever did we get along?  It must have been a hallucination...

Yes, I was writing code before Unicode existed. Have you ever heard of
IBM DBCS? Here's something I could find on the web, though back then,
I was using non-internet documentation for everything:

https://www.ibm.com/support/knowledgecenter/en/ssw_i5_54/nls/rbagssqlanddbchars.htm

Most programs assumed SBCS, with a single system-wide primary
codepage, and a number of "available" codepages. If you needed to
switch code pages, you would generally do so for the entire program.
If you needed to mix and match codepages in a program, life was hard.
If you needed to mix and match codepages in a document, life was
extremely hard.

Using DBCS allowed a lot more variety of languages, but you had to use
dedicated DBCS APIs for everything. Even just indexing characters
couldn't be done, because of shift codes and such. An operation like
"remove the rightmost three characters from this string" required a
function like DBRRIGHT() rather than taking it the easy way.

I do not want to go back to those days. UTF-8 has a few of those
problems (for instance, indexing characters is hard, since it's a
variable-width encoding), but at least every character has a single
byte representation, meaning that valid UTF-8 strings can be joined
trivially and interpreted without needing context. Using a Python 3
string, you don't even have to worry about that - you just work with
characters as fundamental units.

That is why this seems obtuse to me. There is no benefit to going to a
pre-Unicode way of working with text.

ChrisA



More information about the Python-list mailing list