python 2.7 and unicode (one more time)

Marko Rauhamaa marko at pacujo.net
Sat Nov 22 09:18:32 EST 2014


Steven D'Aprano <steve+comp.lang.python at pearwood.info>:

> You haven't given any good reason for objecting to calling Unicode
> strings by what they are. Maybe you think that it is an implementation
> detail, and that some version of Python might suddenly and without
> warning change to only supporting KOI8-R strings or GB2312 strings? If
> so, you are badly mistaken. The fact that Python strings are Unicode
> is not an implementation detail, it is part of the language semantics.

To me, repeating the word Unicode everywhere is giving the (in and of
itself impressive) standard too primary a status. While understanding
how Unicode, IEEE-754, 2's complement, mark-and-sweep etc work is very
useful and occasionally can be taken explicit advantage of, those really
are mundane techniques to implement abstractions.

Python's strings exist (primarily) so you can express utterances in a
human language, aka plain text. They don't exist to express Unicode code
points. That would be putting the cart before the horse.

> "Rectangular door" makes perfect sense, and in a world where there are
> dozens of legacy non-rectangular doors, it would be very sensible to
> specify the kind of door.

It makes sense, and yet, I've never heard anyone talk about rectangular
doors even though I use numerous doors every day. Why is it, then, that
people feel the constant need to add the "Unicode" epithet to Python's
strings, which -- according to its own specification -- are just
strings?


Marko



More information about the Python-list mailing list