Harri Pasanen: Re: [Idle-dev] Known bug? Saving fails in IDLE if accents used in char acters

Sun, 23 Mar 2003 23:33:51 +0100 (CET)

> Saving my python file, still consisting of the single line
> # d=E9j=E0
>=20
> it spits out a message box with a title "I/O Error"
> Non-ASCII found, yet no encoding declared.  Add a line like=20
> #-*- coding: iso-8859-15-*-
> to your file [OK]
>=20
> After clicking OK, it seems to save my file without problems, but it=20
> bugs me with the message at each save.

Hi Harry,

There is not much we can do about this; IDLE *could* try to edit
the file for you, but I consider this too intrusive, hence the error
message.

> The message box is modal, and I cannot cut and paste the line from it.

Patches to correct this are welcome. I'm unsure what Tk widget to use
that allows copyable-but-uneditable text.

> Typing it in manually, my next I/O Error is:
>=20
> Unknown encoding iso-8859-15-.
> Saving as UTF-8
> [OK]

Are you sure there was no space between the 5 and the - in the message?
Adding a space should help.

> Now this is the greatest, as trying to run the resulting file in=20
> python gives:
>=20
> [harri@kapu harri]$ python t.py
>   File "t.py", line 1
>     =EF=BB=BF#-*- coding: iso-8859-15-*-
>     ^
> SyntaxError: invalid syntax

Yes, this won't be a syntax error only in Python 2.3. If you correct
the problem of the encoding declaration being incorrect (by adding
the missing space), the problem will go away.

> Hmm... Perl seems to to have a pragma to enable UTF-8 in source code,=20
> but I was not aware Python would have support for UTF-8 source.  Does=20
> it?

Indeed; this is the result of PEP 263. Unlike Perl, Python supports
multiple different source encoding, hence the need for an explicit
declaration.

> Now how about doing what 99% of other editors do, and supporting by=20
> default iso-8859-15 (basically latin-1), that will make get a couple=20
> of hundred million Europeans happy right there? =20

Supporting this in IDLE would be acceptable, I guess. However, in the
long run, Python itself will refuse source code that lacks a proper
encoding declaration, so I felt that IDLE should teach users how to
do that early on.

> Extending the support outside of latin alphabets is an honorable goal,=20
> but clearly what ever the encoding is should not munge with the=20
> python source code, unless python itself has support for it.

But Python does have support for it.

> The following link has some info on how Java deals with this issue:=20
> http://www.jorendorff.com/articles/unicode/java.html (basically the=20
> java compiler does support multiple charsets).

So does Python. However, the Java method is fundamentally flawed:
Whoever invokes javac needs to know what the source encoding is, and
it needs to be the same for all source code files. I find it=20
unacceptable that users of a library have to know what encoding the
library uses.

In any case, I recommend to read PEP 263.

> All my sympathy for the Japanese/Chinese/... there,  how do you=20
> program in python/C/C++?   I would assume externalizing the strings=20
> would be the easiest, or are there specialized editors that handle=20
> gracefully non-ASCII, non-Latin comments and strings inside ASCII=20
> source code? =20

You would think that people do that, but they don't:
a) they want to put comments into source code, in their native language,
   using the encoding that their system uses.
b) they want to use non-ASCII in identifiers (not supported in Python 2.3=
,
   but may be supported in the future)
c) they do put non-ASCII into string literals and Unicode literals and
   expect this to work. In particular for Unicode literals, this cannot
   work without an encoding declaration. If they target a single language
   only, putting the text into the source is a natural thing to do; the
   overhead of an external message catalogue is unacceptable.

Regards,
Martin