[I18n-sig] Strawman Proposal (2): Encoding attributes

Paul Prescod paulp@ActiveState.com
Fri, 09 Feb 2001 19:07:39 -0800


"M.-A. Lemburg" wrote:
> 
> > ...
> > Also, if we wanted a quick hack, couldn't we implement it at first by
> > "decoding" to UTF-8? Then the parser could look for UTF-8 in Unicode
> > string literals and translate those into real Unicode.
> 
> I don't want to do "quick hacks", so this is a non-option.

If it works and it is easy, there should not be a problem!

> Making the parser Unicode aware is non-trivial as it requires
> changing lots of the internals which expect 8-bit C char buffers.

Are you talking about the Python internals or the parser internals. If
the former, then I do not think you are correct. Only the parser needs
to change.

> If we change the parser to use Unicode, then we would
> have to decode *all* program text into Unicode and this is very
> likely to fail for people who put non-ASCII characters into their
> string literals.

Files with no declaration could be interpreted byte for char just as
they are today!

> ....
> ASCII is not Euro-centric at all since it is a common subset
> of very many common encodings which are in use today. 

Oh come on! The ASCII characters are sufficient to encode English and a
very few other languages.

> Latin-1
> would be, though... which is why ASCII was chosen as standard
> default encoding.

We could go back and forth on this but let me suggest you type in a
program with Latin 1 in your Unicode literals and try and see what
happens. Python already "recognizes" that there is a single logical
translation from "old style strings" to Unicode strings and vice versa.

> The added flexibility in choosing identifiers would soon turn
> against the programmers themselves. Others have tried this and
> failed badly (e.g. look at the language specific versions of
> Visual Basic).

That's a totally different and unrelated issue. Nobody is talking about
language specific Pythons. We're talking about allowing people to name
variables in their own languages. I think that anything else is
Euro-centric.

 Paul Prescod