[I18n-sig] Strawman Proposal (2): Encoding attributes

M.-A. Lemburg mal@lemburg.com
Fri, 09 Feb 2001 22:55:46 +0100


Paul Prescod wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > ...
> >
> > The parser has no idea of what to do with Unicode input...
> > this would mean that we would have to make it Unicode
> > aware and this opens a new can of worms; not only in the case
> > where this encoding specifier is used.
> 
> Obviously the parser cannot be made unicode aware for Python 2.1 but why
> not for Python 2.2? What's so difficult about it? There's no rocket
> science.
> 
> Also, if we wanted a quick hack, couldn't we implement it at first by
> "decoding" to UTF-8? Then the parser could look for UTF-8 in Unicode
> string literals and translate those into real Unicode.

I don't want to do "quick hacks", so this is a non-option.

Making the parser Unicode aware is non-trivial as it requires 
changing lots of the internals which expect 8-bit C char buffers.
It will eventually happen, but is not high priority since it
only servers a convenience and not a real need.
 
> > Also, string literals ("text") would have to translate the
> > Unicode input passed to the parser back to ASCII (or whatever
> > the default encoding is) and this would break code which currently
> > uses strings for data or some specific text encoding.
> 
> It would only break code that adds the encoding declaration. If you
> don't add the declaration you don't break any code!

If we change the parser to use Unicode, then we would
have to decode *all* program text into Unicode and this is very
likely to fail for people who put non-ASCII characters into their
string literals.
 
> Plus, we all agree that passing binary data in literal strings should be
> a deprecated usage eventually. That's why we're inventing binary
> strings.

Yes, but this move needs time... binary strings are meant as
easy to use alternative, so that programmers can easily make the
required changes to their code (adding a few b's in front of their
string literals won't hurt that much).
 
> > ...
> > Sorry, Paul, but this will never happen. Python is an ASCII
> > programming language and does good at it.
> 
> I am amazed to hear you say that. Why SHOULDN'T we allow Chinese
> variables names some day? This is the 21st century. If we don't go after
> Asian markets someone else will! I've gotta admit that that kind of
> Euro-centric attitude sort of annoys me...

ASCII is not Euro-centric at all since it is a common subset
of very many common encodings which are in use today. Latin-1 
would be, though... which is why ASCII was chosen as standard 
default encoding.

The added flexibility in choosing identifiers would soon turn
against the programmers themselves. Others have tried this and
failed badly (e.g. look at the language specific versions of 
Visual Basic).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/