python 2.7 and unicode (one more time)

Marko Rauhamaa marko at pacujo.net
Mon Nov 24 01:57:33 EST 2014


Gregory Ewing <greg.ewing at canterbury.ac.nz>:
> Marko Rauhamaa wrote:
>> Unicode strings is not wrong but the technical emphasis on Unicode is as
>> strange as a "tire car" or "rectangular door" when "car" and "door" are
>> what you usually mean.
>
> The reason Unicode gets emphasised so much is that until relatively
> recently, it *wasn't* what "string" usually meant in Python.
>
> When Python 3 has been around for as long as Python 2 was, things may
> change.

Yes, people call strings "Unicdoe strings" because Python2 *did have*
unicode strings separate from regular strings:

    Python2            Python3
    --------------------------------------
    string             bytes (byte string)
    unicode string     string


In Python2 days, Unicode was a fancy, exotic datatype for the
connoisseurs. The rest used strings. Python3 supposedly elevates Unicode
to boring normalcy. Now it's bytes that have fallen into (unmerited)
disfavor.

But old habits die hard; you call cars "automobile cars" instead of
"cars" since, after all, "cars" were always pulled by horses...


Marko

PS Maybe interestingly, Guile went through an analogous transition. As
of Guile 2.0,

  a character is anything in the Unicode Character Database.
  [...]
  Strings are fixed-length sequences of characters.
  [...]
  A bytevector is a raw bit string.

  <URL: https://www.gnu.org/software/guile/manual/html_node/index.html>

However, Guile 1.8 still had:

  The Guile implementation of character sets currently deals only with
  8-bit characters.

  <URL: https://www.gnu.org/software/guile/docs/docs-1.8/guile-ref/inde
  x.html>

and there were no bytevectors.



More information about the Python-list mailing list