Defining Python Source Code Encodings

Roman Suzi rnd at onego.ru
Wed Jul 18 02:54:40 EDT 2001


On Wed, 18 Jul 2001, Terry Reedy wrote:

> 
> > 3. Python's tokenizer/compiler combo will need to be updated to
> >    work as follows:
> >
> >    1. read the file
> >    2. decode it into Unicode assuming a fixed per-file encoding
> >    3. tokenize the Unicode content
> >    4. compile it, creating Unicode objects from the given Unicode data
> >       and creating string objects from the Unicode literal data
> >       by first reencoding the Unicode data into 8-bit string data
> >       using the given file encoding
> >
> >    To make this backwards compatible, the implementation would have to
> >    assume Latin-1 as the original file encoding if not given (otherwise,
> >    binary data currently stored in 8-bit strings wouldn't make the
> >    roundtrip).
> 
> If I understand this, you would translate (my) ascii code files into
> Unicode, compile, and translate literal strings back to the ascii form they
> started as.  Can this be done without lengthening the compile time 'too
> much'.
> 
> > Issues that still need to be resolved:
> 
> > - what to do with non-literal data in the source file, e.g.
> >   variable names and comments:
> >
> >   * reencode them just as would be done for literals
> >   * only allow ASCII for certain elements like variable names
> >   etc.
> 
> I strongly suspect that people who do not write any Latin alphabet language
> would strongly prefer to write names and comments in their native script.
> This would open Python to millions who are presently excluded.
> Mixed-alphabet texts are pretty common in some non-Latin alphabet
> countries.

As I was said in c.l.p earlier, this is very bad idea and Python Style
Guide says that if you expect to other people from other countries read
your code, write your comments in English.

Exchange of programs will be nearly impossible if variable names will
be in native tongues!

In Russia programmers who do not know English (do those exist?)
write var name using transliteration. For (self-explanatory) example:

slovar = {'spisok': 'list', 
   'slovar': 'dictionary', 
   'element': 'item',
   'perevod': 'translation',
}

spisok = ['element', 'perevod']
for element in spisok:
   try:
       perevod = slovar[element]
       print perevod
   except:
       print element, "???"
  
This is ugly, of course, but having native names will cause
more problems. 

I, for example, can navigate faster in English versions of software,
because translations differ. And even simple "Edit" has several variants.

So, this "interoperability" will actually make interoperability harder, at
least for Open Source. For anything else I do not care.

Sincerely yours, Roman A.Suzi
-- 
 - Petrozavodsk - Karelia - Russia - mailto:rnd at onego.ru -
 





More information about the Python-list mailing list