python 2.7 and unicode (one more time)
Rustom Mody
rustompmody at gmail.com
Sat Nov 22 10:57:22 EST 2014
On Saturday, November 22, 2014 8:14:15 PM UTC+5:30, Roy Smith wrote:
> Marko Rauhamaa wrote:
>
> > Steven D'Aprano:
> >
> > > You haven't given any good reason for objecting to calling Unicode
> > > strings by what they are. Maybe you think that it is an implementation
> > > detail, and that some version of Python might suddenly and without
> > > warning change to only supporting KOI8-R strings or GB2312 strings? If
> > > so, you are badly mistaken. The fact that Python strings are Unicode
> > > is not an implementation detail, it is part of the language semantics.
> >
> > To me, repeating the word Unicode everywhere is giving the (in and of
> > itself impressive) standard too primary a status. While understanding
> > how Unicode, IEEE-754, 2's complement, mark-and-sweep etc work is very
> > useful and occasionally can be taken explicit advantage of, those really
> > are mundane techniques to implement abstractions.
> >
> > Python's strings exist (primarily) so you can express utterances in a
> > human language, aka plain text. They don't exist to express Unicode code
> > points. That would be putting the cart before the horse.
> >
> > > "Rectangular door" makes perfect sense, and in a world where there are
> > > dozens of legacy non-rectangular doors, it would be very sensible to
> > > specify the kind of door.
> >
> > It makes sense, and yet, I've never heard anyone talk about rectangular
> > doors even though I use numerous doors every day. Why is it, then, that
> > people feel the constant need to add the "Unicode" epithet to Python's
> > strings, which -- according to its own specification -- are just
> > strings?
> >
> >
> > Marko
>
> There's a old joke to the effect that the fields of study which are
> confident that they're really doing science (i.e. chemistry, biology,
> physics, astronomy, etc) don't put the word "science" in their names.
> It's only the fields of study that are less confident about their status
> as sciences (computer science, behavioral science, political science,
> etc) that feel the need to explicitly say "science". As if repeating it
> enough times makes it true. I wonder if something of the same thing
> applies here? <ducking and running>
>
> Somewhat more seriously, the IEEE-754 point is quite apropos. Back when
> 754 first came out, there were lots of different floating point
> implementations. Machines that used 754 touted it in their sales
> literature and mentioned it all over their documentation. These days,
> 754 is so ubiquitous, nobody even thinks to mention it, in the same way
> nobody bothers to mention 2's complement integers. I suspect that some
> day, the same thing will happen with Unicode. For that matter, we will
> eventually get to the point where when people say, "just plain text",
> they will mean Unicode, in the same way that "just plain text" today
> really means ASCII (and the text/plain MIME type will become a
> historical curiosity).
Yes this was my point also -- encodings in general and unicode in
particular is a mess (as of 2014). Maybe in a few years the dust
will settle. Then saying 'unicode' will become redundant.
But until then when we have a rather leaky abstraction having
sealing liquid on the hands is preferable to sewage in the house.
More information about the Python-list
mailing list