[I18n-sig] Re: [Python-Dev] Pre-PEP: Python Character Model

Paul Prescod paulp@ActiveState.com
Tue, 06 Feb 2001 16:21:50 -0800


"Martin v. Loewis" wrote:
> 
> >  a) when you try to convert a character greater than 128. In my opinion
> > this is just a poor design decision that can be easily reversed
> 
> Technically, you can easily convert expand it to 256; not that easily
> beyond.

Beyond that is like putting a  long integer into a 32 bit integer slot.
It's a TypeError.

> Then, people who put KOI8-R into their Python source code will
> complain why the strings come out incorrectly, even though they set
> their language to Russion, and even though it worked that way in
> earlier Python versions.

I don't follow.

If I have:

a="abcXXXdef"

XXX is a series of non-ASCII bytes. Those are mapped into Unicode
characters with the same ordinals. Now you write them to a file. You
presumably do not specify an encoding on the file write operation. So
the characters get mapped back to bytes with the same ordinals. It all
behaves as it did in Python 1.0 ... 

You can only introduce characters greater than 256 into strings
explicitly and presumably legacy code does not do that because there was
no way to do that!

> > I think a lot of Unicode interoperability problems would just go
> > away if "a" was fixed...
> 
> No, that would be just open a new can of worms.
> 
> Again, provide a specific patch, and I can tell you specific problems.

It isn't the appropriate time to create such a core code patch. I'm
trying to figure out our direction so that we can figure out what can be
done in the short term. The only two things I can think of are merge
chr/unichr (easy) and provide encoding-smart alternatives to open() and
read() (also easy). The encoding-smart alternatives should also be
documented as preferred replacements as soon as possible.

 Paul Prescod