[I18n-sig] Re: [Python-Dev] unichr

Paul Prescod paulp@ActiveState.com
Thu, 8 Feb 2001 07:31:21 -0800 (PST)


On Thu, 8 Feb 2001, M.-A. Lemburg wrote:

> You are forgetting that the range 128-255 is used by many codepages
> to support language specific characters.

No, I'm not forgetting that. I just don't think it is relevant.

> chr(0xE0) will give different
> characters in the US than e.g. in Russia. If we were to simply
> let these conversions slip through, then people would find garbled
> data in their text files.

People in Russia understand the concept of code pages. They know that
if they put "special" characters in their files they will be interpreted
on other platforms as Western European characters. If we make it easy for
them to explicitly state their encoding then the will do so and get better
behavior then they did before. We can also simplify Python and remove an
arbitrary restriction at the same time.

> Of course, if a user explicitly sets the default encoding to
> Latin-1, then everything will be fine, but for ASCII (which is
> the base of most character encodings in use today) there is
> little other we can do except to raise an exception.

I don't think the "default encoding" is a relevant concept. Most people
came out strongly against it on the Python lists and it was hidden from
user view for that reason. It is a terrible idea to encourage people to
write software that works right on their computer but not on anyone
else's. I think that we should view the "default encoding" as an
implementation artifact and nothing more. We need to define portable rules
that will consistently make sense everywhere.

 Paul Prescod