python 2.7 and unicode (one more time)

Roy Smith roy at panix.com
Sat Nov 22 09:44:02 EST 2014


In article <87y4r348uf.fsf at elektro.pacujo.net>,
 Marko Rauhamaa <marko at pacujo.net> wrote:

> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> 
> > You haven't given any good reason for objecting to calling Unicode
> > strings by what they are. Maybe you think that it is an implementation
> > detail, and that some version of Python might suddenly and without
> > warning change to only supporting KOI8-R strings or GB2312 strings? If
> > so, you are badly mistaken. The fact that Python strings are Unicode
> > is not an implementation detail, it is part of the language semantics.
> 
> To me, repeating the word Unicode everywhere is giving the (in and of
> itself impressive) standard too primary a status. While understanding
> how Unicode, IEEE-754, 2's complement, mark-and-sweep etc work is very
> useful and occasionally can be taken explicit advantage of, those really
> are mundane techniques to implement abstractions.
> 
> Python's strings exist (primarily) so you can express utterances in a
> human language, aka plain text. They don't exist to express Unicode code
> points. That would be putting the cart before the horse.
> 
> > "Rectangular door" makes perfect sense, and in a world where there are
> > dozens of legacy non-rectangular doors, it would be very sensible to
> > specify the kind of door.
> 
> It makes sense, and yet, I've never heard anyone talk about rectangular
> doors even though I use numerous doors every day. Why is it, then, that
> people feel the constant need to add the "Unicode" epithet to Python's
> strings, which -- according to its own specification -- are just
> strings?
> 
> 
> Marko

There's a old joke to the effect that the fields of study which are 
confident that they're really doing science (i.e. chemistry, biology, 
physics, astronomy, etc) don't put the word "science" in their names.  
It's only the fields of study that are less confident about their status 
as sciences (computer science, behavioral science, political science, 
etc) that feel the need to explicitly say "science".  As if repeating it 
enough times makes it true.  I wonder if something of the same thing 
applies here?  <ducking and running>

Somewhat more seriously, the IEEE-754 point is quite apropos.  Back when 
754 first came out, there were lots of different floating point 
implementations.  Machines that used 754 touted it in their sales 
literature and mentioned it all over their documentation.  These days, 
754 is so ubiquitous, nobody even thinks to mention it, in the same way 
nobody bothers to mention 2's complement integers.  I suspect that some 
day, the same thing will happen with Unicode.  For that matter, we will 
eventually get to the point where when people say, "just plain text", 
they will mean Unicode, in the same way that "just plain text" today 
really means ASCII (and the text/plain MIME type will become a 
historical curiosity).



More information about the Python-list mailing list