Python's 8-bit cleanness deprecated?

Roman Suzi rnd at onego.ru
Sat Feb 8 01:54:50 EST 2003


This thread is into it's second life, it seems.

On Fri, 7 Feb 2003, Scott David Daniels wrote:

>Simo Salminen wrote:
>> * Kirill Simonov [Fri, 7 Feb 2003 18:39:56 +0200]
>>>...But what is the price that we pay for this? The millions of Python
>>>scripts that use 8-bit string literals or comments are broken now in
>>>order to allow the feature that no one ever used! I think that this is
>>>an extreme.
>> ...
>> This change only makes python hostile to regular programmer, who
>> does not care about encodings, and only wants to use simple 8-bit
>> characters in comments.
>
>I told myself to be quiet, but ....

[skipping]
>_must_know_the_encodings_.  This is delightful, since now we can
>have a code repository where we can pull contributed code written
>in Brazil, Serbia, Kyoto, and Thailand from a single repository

But will you trust into such code? How could I examine it?
Even if I vaguely understand some Spanish or Serbian words, 
I do not have a chance with Japanese or Thai...

>safely.  The sole cost is explicit encoding.  We could probably
>even cope with EBCDIC, were someone lusting to use old character
>codes, since we need to only look at the first two lines.  If
>we cannot find an encoding in the first two lines looking at
>simple ASCII, we try as EBCDIC and look to see if we find it.
>If not, we then try big5 and ....
>
>Roman Suzi asked:
>     "how one would feel if '# -*- coding: ascii -*-' would be
>      necessary for every program?"
>I replied,
>     "I would probably never use it.  If I had to use an encoding,

And you do not use encoding right now? How that?

>      I would probably use: '# -*- coding: UTF-8 -*-', since I could
>      encode other author's names in comments (or credit strings)."

Great! Usually TeX notation is used for that.

>I really have no idea whether I am mentioning issues here that
>people don't realize, or simply spouting off my opinions to a
>group that finds them unconvincing.  I, of course, hope to be doing
>the former and will resume my silence for fear that I am doing the
>latter.

Utf-8 is not universally supported yet, unfortunately.

OK, OK. I am using new scheme and define my favorite KOI8-R
encoding.

Now, look at the program (I hope you can read it, because this message
is also marked as koi8-r ;-)

#!/usr/bin/python
# -*- coding: koi8-r -*-

l1 = ["а", "в", "ю"]
l1.sort()
for i in l1: print i,
print

l2 = [u"а", u"в", u"ю"]
l2.sort()
for i in l2: print i,
print

# End of kkk.py

And what I am getting? Surprise!

ю а в
Traceback (most recent call last):
  File "kkk.py", line 13, in ?
    for i in l2: print i,
UnicodeEncodeError: 'ascii' codec can't encode character '\u430' in position 
0:
ordinal not in range(128)

- Russian "ju" is BEFORE "a"! That is, sorting is not done according to
rules.

OK. I know, it's locale I need. But the lesson is WHY DO I NEED u""-literals
in my source if every time I output something I need to encode it into some
encoding explicitly?

l2 = [u"а", u"в", u"ю"]
l2.sort()
for i in l2: print i.encode('koi8-r'),
print

At last it is right:

ю а в
а в ю

Adding coding: koi8-r is confusing, because it gives false sense that
after that characters will be sorted correctly!

*

Can anybody give me a hint on how to work comfortably with 
non-ASCII encoding in Python 2.3?
(I do not like specifying u"" and .encode('koi8-r') all the time!)

I translated the above example into utf-8 and am getting:

п╟ п╡ я▌
а в ю

;-)

Or maybe there is a way to tell Python to treat all ""-literals as 
Unicode?

OK, OK. In Python2.3 simple "Hello, World!" will be non-trivial
excersize (if done in languages other than English).

<joke>

OK. Python program is no more a sequence of 8-bit bytes. What is next? Will it
be required at some later time to write programs in MS Word? This way much
more fancy things could be added. My favorite decision-table will be quite
natural. And author's photos could be presented along with the code too.  
There could be multimedia objects defined just as well as string literals.

Maybe Guido should take a hint. I am sure most Python users will be happy they 
can edit their programs in MS Word, as they do with other documents. No need 
to learn Emacs, vi or Notepad.exe. And adding new syntax will be trivial: 
just add new style and you here you go.

And it is not too blue-sky too: it is possible to parse XML with Python and
there is no problem to preprocess Python program by XML parser before
execution. And there are convertors from MSWORD to xml too.

So, myprogram.doc.py will be quite natural choice.

</joke>

Sincerely yours, Roman Suzi
-- 
rnd at onego.ru =\= My AI powered by Linux RedHat 7.3






More information about the Python-list mailing list