python 2.7 and unicode (one more time)

Sun Nov 23 01:17:40 EST 2014

random832 at fastmail.us wrote:

> On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote:
>> I really don't understand what bothers you about this. In Python, we have
>> Unicode strings and byte strings. In computing in general, strings can
>> consist of Unicode characters, ASCII characters, Tron characters, EBCDID
>> characters, ISO-8859-7 characters, and literally dozens of others. It
>> boogles my mind that you are so opposed to being explicit about what sort
>> of string we are dealing with.
> 
> I think he means that it should be implementation-defined with an API
> that does not allow programs to make assumptions about the encoding,
> like C. To allow for implementations that use a different character set.

Python is not C, and doesn't make every second thing undefined behaviour.

If Python treated the character set as an implementation detail, the
programmer would have no way of knowing whether

s = u"ö"

is legal or not, since you cannot know whether or not ö is a supported
character in the running Python. It might work on your system, and fail for
other people. That is worse than the old distinction between "narrow"
and "wide" builds. It would be a lazy and stupid design, and especially
stupid since there really in no good alternative to Unicode today. ASCII is
not even sufficient for American English, the whole Windows code page idea
is a horrible mess, none of the legacy encodings are suitable for more than
a tiny fraction of the world.

-- 
Steven